Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1555151
Summary: | gcc: uninitialized value on armhfp with -O2 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jerry James <loganjerry> | ||||||||||
Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 31 | CC: | davejohansen, dmalcolm, fweimer, jakub, jwakely, law, mpolacek, msebor, nickc, sipoyare | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | armhfp | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2020-11-24 16:40:41 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1555753 | ||||||||||||
Attachments: |
|
I see constructors that don't initialize members: tuple() {}; type_n() {}; Does changing them to avoid uninitialized members make any difference? Those constructors look especially suspect given the valgrind output shows: flint::tuple<type_n<2, true>, flint::empty_tuple> tuple<type_n<2, true>, empty_tuple> has a default constructor that doesn't explicitly initialize its type_n<2, true> member, and that member has a default constructor which doesn't initialize its int member. Try: type_n() : payload() {} Thanks for the comments, Jonathan. Sorry to take so long to get back around to this. I've had soooo many packages break in the last 3 months, it's been difficult to find time for them all. I changed the constructors like so: tuple() : head(), tail() {} type_n() : payload(0) {} That didn't change the outcome. This is curious: if I add either -fsanitize=address or -fsanitize=undefined to the compiler flags, the resulting executable passes the test, displays no errors from the sanitizer, and shows no errors under valgrind. I worked my way through the optimizer differences between -O1 and -O2, and found that -fno-tree-tail-merge also makes the test pass. To summarize, the tests all pass and valgrind shows no complaints if: - Any architecture but 32-bit ARM is used - GCC 7 or earlier is used - -O0 or -O1 is given - -fsanitize=address or -fsanitize=undefined is given - -fno-tree-tail-merge is given (In reply to Jerry James from comment #3) > Thanks for the comments, Jonathan. Sorry to take so long to get back around > to this. I've had soooo many packages break in the last 3 months, it's been > difficult to find time for them all. Yes, I've noticed how much time you've had to spend in bugzilla! > I changed the constructors like so: > > tuple() : head(), tail() {} > type_n() : payload(0) {} > > That didn't change the outcome. OK, thanks for checking. If that had been the problem I'd have expected it to show up on other arches anyway. I should have mentioned back in May that I added -fno-tree-tail-merge to the flint build flags, on 32-bit ARM only. But it would be nice if somebody could figure out what is really going wrong here, in particular whether this is a gcc bug or a flint bug. This bug appears to have been reported against 'rawhide' during the Fedora 29 development cycle. Changing version to '29'. The behavior is unchanged with gcc 9.0.1-0.4.fc30.armv7hl. The test passes with -fno-tree-tail-merge, but fails with just -O2. Please advise whether this is a gcc bug or a bug in the test code. Created attachment 1538290 [details]
Reduced test case
I've got a smaller, though sadly less comprehensible, test case due to C-Reduce. For the bug to manifest, all of the following seem to be needed:
- The variable declaration in the inner block must have the same name as the one declared after the inner block.
- Both the inner block and the outer block must end with the same function call (changing exit(0) to return 0 makes the bug stop manifesting).
- Complex variable types. My attempts at changing the nested template types to something simpler have also made the bug stop manifesting.
I get FAIL printed and valgrind complaining about "Conditional jump or move depends on uninitialised value" both when it is built with -O0 and -O2, g++ 9.0.1 as well as 8.2.1 on the #c8 testcase. Are you sure it is valid? And on x86_64 as well (also at -O0). Yes, I let C-Reduce reduce to invalid code. :-( I'm working on another reduction right now that will hopefully avoid that issue. I'll attach it here when it is ready. Created attachment 1539610 [details]
Reduced test case
I seem to be having trouble preventing C-Reduce from introducing uninitialized values. In the meantime, then, here is a hand-reduced version with no preprocessor directives. However, note that the recipe for triggering the issue has now changed.
When built with "g++ -O1 -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations -Wall -Wextra -o test test.cpp", the program exits with exit code 1.
When built with "g++ -O1 -fno-tree-ter -Wall -Wextra -o test test.cpp", the program exits with exit code 0.
In neither case are any warnings emitted. None of valgrind, -fsanitize=address, or -fsanitize=undefined report any issues.
Argh. That's -fno-tree-sra to get exit code 0, not -fno-tree-ter. Sorry about that. Created attachment 1539611 [details]
Reduced test case
I've bisected the #c0 testcase with the #c3 fixes to http://gcc.gnu.org/r255510 GCC change, which doesn't mean anything, appart from that we should study how it changed the code generation and whether something is wrong. This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to '31'. This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle. Changing version to 31. This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |
Created attachment 1407796 [details] Test case showing problem on 32-bit ARM Description of problem: The flint package failed the mass rebuild on 32-bit ARM, due to a failing test. The test passes on all other architectures, and passes on all architectures including 32-bit ARM with previous versions of gcc (F27 and older). Running under valgrind shows a lot of "conditional jump or move depends on uninitialised value(s)" warnings, none of which appear when run under valgrind on other architectures (or, again, on 32-bit ARM when compiled with older versions of gcc). The problem does not appear at optimization levels -O0 or -O1, but does at -O2. I have reduced the failing test down to sources which I will attach to this bug. This is the first warning issued by valgrind: ==9442== Conditional jump or move depends on uninitialised value(s) ==9442== at 0x11B7C: std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_get_insert_unique_pos(int const&) (stl_tree.h:2055) ==9442== by 0x11CA3: std::pair<std::_Rb_tree_iterator<int>, bool> std::_Rb_tree<int, int, std::_Identity<int>, std::less<int>, std::allocator<int> >::_M_insert_unique<int const&>(int const&) (stl_tree.h:2106) ==9442== by 0x1200F: insert (stl_set.h:511) ==9442== by 0x1200F: doit (test.cpp:322) ==9442== by 0x1200F: doit (test.cpp:323) ==9442== by 0x1200F: std::set<int, std::less<int>, std::allocator<int> > values<flint::tuple<type_n<0, true>, flint::tuple<type_n<1, true>, flint::tuple<type_n<2, true>, flint::empty_tuple> > > >(flint::tuple<type_n<0, true>, flint::tuple<type_n<1, true>, flint::tuple<type_n<2, true>, flint::empty_tuple> > > const&) (test.cpp:347) ==9442== by 0x11253: main (test.cpp:396) Version-Release number of selected component (if applicable): gcc-c++-8.0.1-0.17.fc29.armv7hl How reproducible: Always Steps to Reproduce: 1. Build the attached sources with g++ -O2 -o test test.cpp 2. Run ./test 3. Actual results: On 32-bit ARM, the test fails: FAIL test.cpp:396: assertion vals1.size() == 4 failed On all other Fedora architectures, the test passes. Expected results: The test should pass on all architectures. Additional info: