Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1883457 - coreos-installer test segfaults with rust-1.46/llvm-11
Summary: coreos-installer test segfaults with rust-1.46/llvm-11
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: llvm
Version: 34
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tom Stellard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2020-09-29 09:23 UTC by Dan Horák
Modified: 2021-06-18 07:19 UTC (History)
14 users (show)

Fixed In Version: llvm-11.0.0-1.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rust-lang rust issues 77382 0 None closed coreos-installer test segfaults on s390x-unknown-linux-gnu 2021-01-12 13:07:17 UTC
LLVM 47736 0 P RESOLVED SystemZ reordered store/compare clobbers CC 2021-01-12 13:07:21 UTC

Description Dan Horák 2020-09-29 09:23:25 UTC
We see a test in coreos-installer build to crash with a segfault when build in recent rawhide buildroot.

[sharkcz@devel10 rust-coreos-installer]$ /home/sharkcz/rust-coreos-installer/coreos-installer-0.7.0/target/release/deps/libcoreinst-e60dceed92ae9902

running 16 tests
test blockdev::tests::disk_sector_size_reader ... ok
test blockdev::tests::lsblk_split ... ok
test blockdev::tests::test_saved_partitions ... ok
test cmdline::tests::test_parse_partition_filters ... ok
test download::tests::test_image_copy_default_first_mb ... ok
test download::tests::test_write_image_limit ... Neoprávněný přístup do paměti (SIGSEGV) (core dumped [obraz paměti uložen])


I suspect something is wrong in llvm 11 ...


Version-Release number of selected component (if applicable):
BAD = rust-1.46.0-2.fc34 + llvm-libs-11.0.0-0.8.rc3.fc34
OK = rust-1.45.2-1.fc33 + llvm10-libs-10.0.0-9.fc34

How reproducible:
100%

Steps to Reproduce:
1. rebuild rust-coreos-installer

Comment 1 Dan Horák 2020-09-29 09:30:11 UTC
backtrace from gdb

(gdb) where
#0  0x000002aa0c3616c4 in core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#1  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#2  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#3  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#4  core::ptr::drop_in_place () at /builddir/build/BUILD/rustc-1.46.0-src/src/libcore/ptr/mod.rs:184
#5  <libcoreinst::source::FileLocation as libcoreinst::source::ImageLocation>::sources (self=0x3ffcc3fc4f8) at src/source.rs:135
#6  libcoreinst::download::tests::test_write_image_limit () at src/download.rs:489
#7  0x000002aa0c36ccb2 in std::panicking::try::do_call ()
#8  0x000002aa0c3a224c in __rust_try ()
#9  0x000002aa0c39ecbe in test::run_test::run_test_inner::{{closure}} ()
#10 0x000002aa0c39e4b6 in test::run_test::run_test_inner ()
#11 0x000002aa0c39ce6e in test::run_test ()
#12 0x000002aa0c39559c in test::run_tests ()
#13 0x000002aa0c37ef4e in test::console::run_tests_console ()
#14 0x000002aa0c392a54 in test::test_main ()
#15 0x000002aa0c3943d4 in test::test_main_static ()
#16 0x000002aa0c310a72 in std::rt::lang_start::{{closure}} () at /builddir/build/BUILD/rustc-1.46.0-src/src/libstd/rt.rs:67
#17 0x000002aa0c664446 in std::panicking::try::do_call ()
#18 0x000002aa0c66daa4 in __rust_try ()
#19 0x000002aa0c665156 in std::rt::lang_start_internal ()
#20 0x000002aa0c310a58 in std::rt::lang_start (main=<optimized out>, argc=<optimized out>, argv=<optimized out>) at /builddir/build/BUILD/rustc-1.46.0-src/src/libstd/rt.rs:67
#21 0x000003ff9b8abbda in __libc_start_main () from /lib64/libc.so.6
#22 0x000002aa0c30cdf4 in _start ()

Comment 2 Dan Horák 2020-09-29 10:09:23 UTC
build in koji is https://koji.fedoraproject.org/koji/taskinfo?taskID=52413150

Comment 3 Dan Horák 2020-09-29 14:22:42 UTC
I have made a build of rust 1.46 with llvm-10 (https://koji.fedoraproject.org/koji/taskinfo?taskID=52444549) and the test crashes there as well. So this could be really a rust issue, not a LLVM one.

[sharkcz@devel10 rust-coreos-installer]$ /home/sharkcz/rust-coreos-installer/coreos-installer-0.7.0/target/release/deps/libcoreinst-135b46c0efd4c9e3

running 16 tests
test blockdev::tests::disk_sector_size_reader ... ok
test blockdev::tests::lsblk_split ... ok
test blockdev::tests::test_saved_partitions ... ok
test cmdline::tests::test_parse_partition_filters ... ok
test download::tests::test_image_copy_default_first_mb ... ok
test download::tests::test_write_image_limit ... Segmentation fault (core dumped)


For the record - my builds were done with a single CPU system, the output is different on a multi-CPU system

Comment 4 Josh Stone 2020-09-29 21:19:16 UTC
It looks like upstream coreos-installer has been dancing around s390x issues:

https://github.com/coreos/coreos-installer/pull/360
https://github.com/coreos/coreos-installer/issues/372
https://github.com/coreos/coreos-installer/pull/373

I don't know if those changes have anything to do with the test in question here.

Comment 5 Benjamin Gilbert 2020-09-29 22:02:43 UTC
All three of those issues came down to an LTO bug in Rust 1.43 and 1.44: https://github.com/coreos/coreos-installer/issues/372#issuecomment-686424629.  We didn't see it in FCOS, which was already on 1.45.  We ended up making net no code changes for it (there's a PR plus a revert) and just disabled LTO in the RHCOS package.

The issue reported here is not known to be related.  I could try disabling LTO in the package.

Comment 6 Benjamin Gilbert 2020-09-29 22:17:33 UTC
Looks like it also fails with LTO disabled: https://koji.fedoraproject.org/koji/taskinfo?taskID=52479791

Comment 7 Josh Stone 2020-09-29 22:19:28 UTC
Ah, right, I think that s390x LTO issue was bug 1837660. That was a build failure, but I guess it could break in other ways. Current LLVM shouldn't be affected by that particular issue.

Comment 8 Josh Stone 2020-09-30 17:45:35 UTC
I can reproduce this, but only in optimized builds with "-Ccodegen-units=1", which rust-packaging %cargo_prep sets in ".cargo/config". So if you need an immediate workaround, you could edit that file to increase or remove that argument. I believe the default is 16 in release builds.

I'll try to bisect the rust change, but the build is really slow in the beaker machine I got...

Comment 9 Benjamin Gilbert 2020-09-30 22:58:21 UTC
Josh, thanks for the workaround.  Applied in rust-coreos-installer-0.7.0-2.fc34.

Comment 10 Josh Stone 2020-10-15 00:28:23 UTC
I've confirmed that the upstream LLVM patch fixes the problem here.
https://reviews.llvm.org/D89034

Comment 11 Josh Stone 2020-10-19 17:23:45 UTC
This should be fixed in rawhide now -- do the coreos folks need this backported to stable branches too?

Comment 12 Benjamin Gilbert 2020-10-19 17:48:53 UTC
It'd be useful, but not required.  Per comment 9 we have a workaround in the coreos-installer package.

Thanks for all your work to track this down!

Comment 13 Ben Cotton 2021-02-09 15:19:10 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.


Note You need to log in before you can comment on or make changes to this bug.