1117799 – tests fail when texlive is compiled with gcc-4.9.0-12.fc21

Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1117799 - tests fail when texlive is compiled with gcc-4.9.0-12.fc21

Summary: tests fail when texlive is compiled with gcc-4.9.0-12.fc21

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gcc
Sub Component:
Version:	rawhide
Hardware:	s390x
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	ZedoraTracker
TreeView+	depends on / blocked

Reported:	2014-07-09 12:17 UTC by Michal Toman
Modified:	2015-03-23 00:42 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-01-07 14:47:44 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
pdftex0.i (1.17 MB, text/plain) 2014-07-09 12:18 UTC, Michal Toman	no flags	Details
backtrace (57.97 KB, text/plain) 2014-07-09 12:19 UTC, Michal Toman	no flags	Details
proposed change for the gcc package (3.21 KB, patch) 2014-07-24 20:03 UTC, Dan Horák	no flags	Details \| Diff
View All

Description Michal Toman 2014-07-09 12:17:41 UTC

pdftexdir/wprob.test and pdftexdir/pdfimage.test fail when texlive is compiled with gcc-4.9.0-12.fc21.s390x with -O2, while -O1 and -O0 work correctly.

I have tracked the problem down to function macrocall() from pdftex0.c. Prefixing the function with __attribute__((optimize (0))) fixes the problem. After calling macrocall() the program continues running but one of the subsequent calls to getnext() causes a segfault.

Attaching the preprocessed source and backtrace. Let me know if I can provide any more information.

Comment 1 Michal Toman 2014-07-09 12:18:29 UTC

Created attachment 916753 [details]
pdftex0.i

Comment 2 Michal Toman 2014-07-09 12:19:01 UTC

Created attachment 916754 [details]
backtrace

Comment 3 Dan Horák 2014-07-09 12:26:26 UTC

Just a note - pdftex0.c is generated from a pascal source file that is generated from native *.web TeX source file, so please be benevolent.

Comment 4 Jakub Jelinek 2014-07-09 12:42:53 UTC

So, like with the previous bugreport, we need a self-contained short testcase first.
Does the problem still exist if you replace "optimize (0)" with "noinline, noclone" (i.e. if macrocall is optimized normally, just doesn't get inlined and is not cloned)?  What about if you compiled with -O2 -fno-inline?  Thus, is the problem really in macrocall itself or perhaps in some code inlined into it?
How many calls to macrocall there are before it crashes?

Comment 5 Jakub Jelinek 2014-07-09 16:21:25 UTC

-O2 -fno-inline still reproduces, also compiling the whole source file with -O0 and just the macrocall function with __attribute__((noinline, noclone, optimize (2))) reproduces it too, with -mno-lra it works.
Command line options (with __attribute__((noinline, noclone, optimize (2))) added
to macrocall):
-fexceptions -fstack-protector-strong -m64 -march=z9-109 -mtune=z10 -fno-strict-aliasing -fno-inline -O0
There are some -Wmaybe-uninitialized warnings, but initializing the variables doesn't seem to fix the problem.
From debugging it seems the problem in the macrocall function is that the n variable is zero when it should be 1 near the end of the function.
In the assembly, it seems n has been hoisted to %r15+196 (32-bit word) or, because of endianity, just the low 8 bits of that %r15+199:
        mvi     199(%r15),0
corresponds to n = 0;
then for:
        else pstack [n ]= mem [memtop - 3 ].hh .v.RH ;
        incr ( n ) ;
I'm seeing:
.L2853:
        .loc 1 10331 0
        lgfr    %r1,%r1
        llgc    %r2,199(%r15)
        larl    %r8,pstack
        sllg    %r1,%r1,3
        sllg    %r2,%r2,2
        ly      %r1,-24(%r1,%r12)
        st      %r1,0(%r2,%r8)
.L2854:
        .loc 1 10332 0
        l       %r1,196(%r15)
        ahi     %r1,1
        st      %r1,180(%r15)
        stc     %r1,184(%r15)
        .loc 1 10333 0
where the first basic block looks correct, it reads byte at 199+%r15, but the second, which is supposed to do ++n; looks wrong, while it reads the right value, it it stores it elsewhere.
          printint ( n ) ;
a few lines below again assumes %r15+199 (i.e. given the increment the old n value rather than new n value):
        .loc 1 10337 0
        llgc    %r2,199(%r15)
        brasl   %r14,zprintint
then:
          showtokenlist ( pstack [n - 1 ], -268435455L , 1000 ) ;
looks like:
        .loc 1 10339 0
        llc     %r1,183(%r15)
        lgfi    %r3,-268435455
        lghi    %r4,1000
        ahi     %r1,-1
        lgfr    %r1,%r1
        sllg    %r1,%r1,2
        lgf     %r2,0(%r1,%r8)
        brasl   %r14,zshowtokenlist
and there it uses the low byte of the %r15+180 value (i.e. the new incremented n).
          pstack [n ]= mem [memtop - 3 ].hh .v.RH ;
became:
        .loc 1 10165 0
        ic      %r3,199(%r15)
        larl    %r1,memtop
        larl    %r8,pstack
.LBB23:
        lghi    %r9,0
.LBE23:
        lgf     %r2,0(%r1)
        llgcr   %r1,%r3
.LBB24:
        lgr     %r10,%r1
        aghi    %r10,1
.LBE24:
        sllg    %r2,%r2,3
        sllg    %r1,%r1,2
        ly      %r2,-24(%r2,%r12)
        st      %r2,0(%r1,%r8)
                pstack [n ]= mem [memtop - 3 ].hh .v.RH ;
        .loc 1 10226 0
        ic      %r3,199(%r15)
        larl    %r1,memtop
        larl    %r8,pstack
...
and finally:
  if ( n > 0 ) 
        .loc 1 10350 0
        llc     %r0,199(%r15)
        .loc 1 10348 0
        l       %r1,0(%r1)
        st      %r1,16(%r12)
        .loc 1 10350 0
        ltr     %r0,%r0
        .loc 1 10349 0
        l       %r1,0(%r6)
        st      %r1,8(%r12)
        .loc 1 10350 0
        je      .L2822

Comment 6 Jakub Jelinek 2014-07-09 16:34:16 UTC

In *.ira dump we have:
(insn 673 675 674 85 (parallel [
            (set (reg:SI 583)
                (plus:SI (subreg:SI (reg/v:QI 281 [ n ]) 0)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 33 %cc))
        ]) pdftex0.c:10332 327 {*addsi3}
     (expr_list:REG_DEAD (reg/v:QI 281 [ n ])
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
            (nil))))
(insn 674 673 676 85 (set (reg/v:QI 281 [ n ])
        (subreg:QI (reg:SI 583) 3)) pdftex0.c:10332 74 {*movqi}
     (nil))
and *.reload turns this into:
         Choosing alt 2 in insn 673:  (0) d  (1) 0  (2) K {*addsi3}
      Creating newreg=851 from oldreg=583, assigning class GENERAL_REGS to r851
  673: {r851:SI=r851:SI+0x1;clobber %cc:CC;}
      REG_DEAD r281:QI
      REG_UNUSED %cc:CC
    Inserting insn reload before:
 1268: r851:SI=r281:QI#0
    Inserting insn reload after:
 1269: r583:SI=r851:SI

(insn 1268 675 673 85 (set (reg:SI 1 %r1 [583])
        (mem/c:SI (plus:DI (reg/f:DI 15 %r15)
                (const_int 196 [0xc4])) [0 %sfp+-36 S4 A8])) pdftex0.c:10332 67 {*movsi_zarch}
     (nil))
(insn 673 1268 1301 85 (parallel [
            (set (reg:SI 1 %r1 [583])
                (plus:SI (reg:SI 1 %r1 [583])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 33 %cc))
        ]) pdftex0.c:10332 327 {*addsi3}
     (nil))
(note 1301 673 1300 85 NOTE_INSN_DELETED)
(insn 1300 1301 674 85 (set (mem/c:SI (plus:DI (reg/f:DI 15 %r15)
                (const_int 180 [0xb4])) [0 %sfp+-52 S4 A32])
        (reg:SI 1 %r1 [583])) pdftex0.c:10332 67 {*movsi_zarch}
     (nil))
(insn 674 1300 1299 85 (set (mem/c:QI (plus:DI (reg/f:DI 15 %r15)
                (const_int 184 [0xb8])) [0 %sfp+-48 S1 A64])
        (reg:QI 1 %r1 [orig:583+3 ] [583])) pdftex0.c:10332 74 {*movqi}
     (nil))

Vlad, can you please have a look?  Thanks.

Comment 7 Vladimir Makarov 2014-07-09 20:47:39 UTC

The problem is in LRA inheritance code.  That is pretty complicated even for me.  So, Jakub, I don't think you can fix this easily.  I am going to work on this but it might take a few days.

Comment 8 Jakub Jelinek 2014-07-09 21:28:14 UTC

Ok, thanks.  I'm going to release 4.9.1-rc1 (and thus 4.9.1 too) without it then.

Comment 9 Vladimir Makarov 2014-07-11 17:43:51 UTC

I've committed a patch fixing the bug into the trunk.  Jakub, when should I do the same for gcc-4.9-branch?

Comment 10 Jakub Jelinek 2014-07-11 18:06:48 UTC

I think it will not hurt to have it for a few days on the trunk, so I wouldn't rush it into 4.9.1 after rc1 went out.  So, if all goes well, can you commit it after 4.9.1 release (tentatively Thursday), e.g. during Cauldron?

Comment 11 Vladimir Makarov 2014-07-11 18:14:56 UTC

Ok, I can do it during Cauldron.

Comment 12 Dan Horák 2014-07-17 11:58:46 UTC

Jakub, gcc-4.9.1-2.fc21 is still without the fix for this issue, correct?

Comment 13 Jakub Jelinek 2014-07-17 12:13:11 UTC

Yes, forgot about it, could have included it as a patch.  Vlad will hopefully check it in soon.

Comment 14 Jakub Jelinek 2014-07-17 16:23:57 UTC

BTW, 4.9.1-2.fc21 failed to build on s390* due to some texinfo issue, is that chicken-and-egg problem (fixing texinfo requires fixed texlive)?

Comment 15 Dan Horák 2014-07-18 06:22:48 UTC

There are still couple perl modules that weren't rebuild yet meaning they are affected by the glibc ABI change (s390 only, pre 2.19). We didn't get over the F-21 mass rebuild yet which will fix it. There is a special build target (f21-glibc) with that has those broken packages replaced. I'll take care of the 4.9.1-2.fc21 build.

Comment 16 Michal Toman 2014-07-24 19:50:32 UTC

Tested with http://s390.koji.fedoraproject.org/koji/taskinfo?taskID=1442611 (4.9.1-2.fc21 + the fix for this bug) and the problem is fixed. Is there any chance to include the fix in 4.9?

Comment 17 Dan Horák 2014-07-24 20:03:53 UTC

Created attachment 920715 [details]
proposed change for the gcc package

I could commit and build updated gcc if I can get green light.

Comment 18 Jeff Law 2014-07-24 20:06:24 UTC

Marek, can you pull the change referenced in c#17 into f21 & rawhide to unblock the s390 guys?

Thanks,
Jeff

Comment 19 Marek Polacek 2014-07-25 12:04:26 UTC

I was about to, but I got delayed when setting the FAS stuff - to be actually able to clone the repo, etc.  Meanwhile, Dan offered to do the commit - so Dan, please go ahead.

Comment 20 Jakub Jelinek 2015-01-07 14:43:03 UTC

This should be fixed in f21 by now, right?

Comment 21 Dan Horák 2015-01-07 14:46:38 UTC

(In reply to Jakub Jelinek from comment #20)
> This should be fixed in f21 by now, right?

yes, it is

Note You need to log in before you can comment on or make changes to this bug.