Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 442285
Summary: | Solve real cause of #437701 and rebuild memtest86+ with GCC 4.3 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Robert Scheck <redhat-bugzilla> |
Component: | memtest86+ | Assignee: | Jarod Wilson <jarod> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | anton, dcantrell, jakub, jarod, promac, robatino |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-12-25 17:42:02 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 438944, 446451 |
Description
Robert Scheck
2008-04-13 21:41:11 UTC
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Ping? Maybe Jakub is able to provide some help here? Ping? This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Warren - any news here? Jakub, can you please help investigating here to solve the issue...I think, you're a good C guy? Adding Jesse that we maybe can get rid of that before Fedora 11? memtest86+ version 2.10 announcement mentioned: Warning : GCC 4.2+ is not yet supported. Thanks to the new pointers over/underflows limitations introduced with that version ! Memtest86+ compiled with GCC 4.2+ will result in a non-working binary (hangs in the first seconds). We're working on a fix but there is really many parts of the code to check & change everywhere. So, please use GCC 4.1 instead. So this isn't our fault. This is heavy lifting that must happen upstream. This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. ... and it looks like it isn't happening for F11. I don't think we'd delay the release for this, moving over to target. This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping I've been asked to look into this, and have been getting up to speed on things intertwined with other tasks, and just started in on some bisection of .o files. I've got a build with all gcc 4.4 build .o files, except for test.o and patn.o, and its already most of the way through a test run (already way further than a pure gcc 4.4 build got). Assuming this pass completes, I'll check if its only one or both of these causing failure, then start looking at the respective source files themselves. Its looking like fixing this won't actually be all that bad. Narrowed the failure down to test.o, looking over its source now... Does it fail even with -O0 when built with gcc 4.4? Or when built with -O2 -fno-inline -fno-inline-small-functions? If inlining isn't necessary to reproduce the failure, you can as well continue doing the binary search among functions within that file (put half of functions into one file, half in another, remove static and/or add prototypes where needed). Or, if it doesn't fail at -O0, you can use __attribute__((__optimize__(0))) on various functions to compile them non-optimized, while keeping the rest optimized. (In reply to comment #12) > Does it fail even with -O0 when built with gcc 4.4? It works just fine built with -O0 > Or when built with -O2 > -fno-inline -fno-inline-small-functions? Fails the same as before with these options. (stock is -Os, fwiw) > If inlining isn't necessary to > reproduce the failure, you can as well continue doing the binary search among > functions within that file (put half of functions into one file, half in > another, remove static and/or add prototypes where needed). > Or, if it doesn't fail at -O0, you can use __attribute__((__optimize__(0))) on > various functions to compile them non-optimized, while keeping the rest > optimized. I'll give the latter a shot next. Thanks much! So given that it was failing on test #2, I'd been figuring it was most likely the function addr_tst2() in test.c at fault, so I threw __attribute__((__optimize(0))) on that first, and that indeed the resulting binary does work, so it confirms my suspicion about the guilty function. Still looking over the code, but after a few reads through, I'm not sure what it could be. I'm assuming probably something in the /* Check for overflow */ area is getting optimized out, and that's probably where I'll tinker next, but its now 7pm on Friday night, and the kids are wanting some attention. ;) Oh, wait... I think I might see it now... But yeah, kid time first... Relying on undefined signed overflow or something similar? if (pe + SPINSZ > pe) or if (pe - SPINSZ < pe) or if (pe + SPINSZ*4 > pe) for pe being pointer is obviously all optimized away, pointers in C never wrap around and so for positive SPINSZ all these can be assumed always true. Just use if ((((uintptr_t)pe) + SPINSZ * sizeof (ulong)) > (uintptr_t)pe) and similar. Alternatively you could use -fno-strict-overflow, but it is better to just fix the buggy code. > Just use
> if ((((uintptr_t)pe) + SPINSZ * sizeof (ulong)) > (uintptr_t)pe)
> and similar.
So I tried this (actually, a number of different variations of it too), but its still blowing up the same as before. Must be doing something wrong, or missing something else... Going to table it 'til Monday, but we're close...
In a hurry to try to get this figured out last week, I think I may have stumbled on one of the last few steps... After back-tracking, building with -O0 works, but the attribute addition to addr_tst2() only route is failing, same as before. Going to retrace my steps from there forward now... Okay... So I was right the first time, adding the attribute to addr_tst2() does work. However, its definitely not the pe comparison that is tripping things up (not to say that it doesn't also need to be fixed). Looking like somehow the BAILR bit is getting optimized out. Here's the minimal working diff: --- rpmbuild/BUILD/memtest86+-2.11/test.c 2008-11-15 19:18:14.000000000 -0500 +++ memtest86+-2.11/test.c 2009-07-27 13:20:34.885941221 -0400 @@ -130,6 +130,15 @@ void addr_tst1() } } +static int bail_check() +{ + if (bail) + return 1; + else + return 0; + +} __attribute__((__optimize__(0))) + /* * Memory address test, own address */ @@ -181,7 +190,9 @@ void addr_tst2() : "D" (p), "d" (pe) ); do_tick(); - BAILR + //BAILR + if (bail_check()) + return; } while (!done); } @@ -243,7 +254,9 @@ void addr_tst2() : "ecx" ); do_tick(); - BAILR + //BAILR + if (bail_check()) + return; } while (!done); } } BAILR is simply: #define BAILR if (bail) return; Oh fun. The plot thickens. Its not actually addr_tst2() where we're running into trouble. That correlates to Test #1, not #2. I'd assumed it started at Test #1, because Test #0 happens so fast, you never see it on-screen. Test #2, where we're dying, is actually "Moving inversions, ones & zeros", which corresponds to movinvr(). So why the above helps, I dunno... Digging some more... Oh crap. I think I'm dumb, and don't know what I'm doing with where I'm putting the attribute... Just looked at the docs, yep, I'm Doing It Wrong. Ugh. Ignore the last several comments, lemme redo this the right way. Alright, I'm back on track now... And I have a theory as to what is going on. It seems that gcc thinks 'bail' is always zero, and so the BAILR lines are always optimized out. bail is defined in main.c, then referenced as an extern int in both test.c and config.c. It can be set to non-zero by config.c, but is never touched by test.c. Pretty much any effort I take to NOT optimize something that touches bail within test.c results in a working Test #2, which would explain why my earlier misguided attempts worked, even though I was prodding the wrong function... I don't know if this is a gcc buglet, or if the code is just doing something it really shouldn't... Or if I'm way off and don't know what I'm talking about. ;) gcc -S -fverbose-asm -Wall -march=i486 -m32 -Os -fomit-frame-pointer -fno-builtin -ffreestanding test.c grep bail test.s | sort | uniq -c 21 cmpl $0, bail #, bail grep BAILR test.c | wc -l 21 So, at least in test.c that doesn't seem to be the case. Have you tried compiling test.c with -march=i486 -Os -fomit-frame-pointer -fno-builtin -fftreestanding -fno-strict-overflow (or s/-fno-strict-overflow/-fwrapv)? > So, at least in test.c that doesn't seem to be the case. Huh, so it isn't. I'm rather confused why simply un-optimizing the parts that touch bail seems to make a difference then... > Have you tried compiling test.c with -march=i486 -Os -fomit-frame-pointer > -fno-builtin -fftreestanding -fno-strict-overflow I thought I had, but apparently not. Just tried that, and the resulting binary does work as expected. (In reply to comment #18) > > Just use > > if ((((uintptr_t)pe) + SPINSZ * sizeof (ulong)) > (uintptr_t)pe) > > and similar. > > So I tried this (actually, a number of different variations of it too), but its > still blowing up the same as before. Must be doing something wrong, or missing > something else... Going to table it 'til Monday, but we're close... Finally figured it out... This doesn't work: if ((((uintptr_t)pe) + SPINSZ) > (uintptr_t)pe) But this does: if ((uintptr_t)(pe + SPINSZ) > (uintptr_t)pe) Is that expected? Moving to rawhide and reassigning. So while casting the result of the addition gets us through test #2 (and 3-6), test #7 still fails spectacularly, for different reasons. Marking the movinvr() function as -O0 gets it working, so I started refactoring the code to try to narrow down where it was going wrong. If I simply move this block (the first do while loop) out of movinvr() and into its own function, called by movinvr(), then a full -Os compiled build works. Without moving the code, it results in test #7 failing. ----8<---- do { /* Check for overflow */ if ((uintptr_t)(pe + SPINSZ) > (uintptr_t)pe) { pe += SPINSZ; } else { pe = end; } if (pe >= end) { pe = end; done++; } if (p == pe ) { break; } /* Original C code replaced with hand tuned assembly code */ /* for (; p < pe; p++) { *p = rand(); } */ asm __volatile__ ( "jmp L200\n\t" ".p2align 4,,7\n\t" "L200:\n\t" "call rand\n\t" "movl %%eax,(%%edi)\n\t" "addl $4,%%edi\n\t" "cmpl %%ebx,%%edi\n\t" "jb L200\n\t" : "=D" (p) : "D" (p), "b" (pe) : "eax" ); do_tick(); BAILR } while (!done); ----8<---- I'm pretty much at a loss why moving that code *unaltered* beyond putting it in a separate function works... Also works within the loop if movinvr() is tagged with __attribute__((__optimize__(2))). About the only thing left that I can think of is perhaps a register in addition to eax is getting clobbered at some point. Will do a bit of poking in this direction... Indeed, simply adding edx to the clobber list makes things fully functional without the loop moved external to movinvr(). I'm guessing perhaps the call to rand() somehow stomps on it, since the rest of the asm doesn't use it at all... Going to call it good with this change, flip the switch, and rebuild memtest86+ w/o resorting to gcc34 now. We've been including gcc 4.4-built memtest86+ packages in rawhide for about a month now, haven't seen any new bug reports come in, so I'll go ahead and close this as fixed. memtest86+-4.00-2.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/memtest86+-4.00-2.fc11 Reopening, because of "BuildRequires: compat-gcc-34" - why do we need this? We don't need this, fixed with memtest86+-4.00-3.fc13. memtest86+-4.00-2.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report. |