Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1871087 - lbzip2 testsuite hangs on s390x when built with LTO
Summary: lbzip2 testsuite hangs on s390x when built with LTO
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: lbzip2
Version: 34
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dominik 'Rathann' Mierzejewski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker F-ExcludeArch-s390x
TreeView+ depends on / blocked
 
Reported: 2020-08-21 10:10 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2021-06-18 07:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dominik 'Rathann' Mierzejewski 2020-08-21 10:10:30 UTC
Description of problem:
Test suite hangs on F33+
...
PASS: manual-expand.test 15 a20fc1f69cbf5e491cbd9a06352cb70d19e3e153
PASS: manual-expand.test 16 a904f3028c0ed3c565a1fb5c69404415f7a8f1d4
PASS: manual-expand.test 17 b229cc0d0253184fdf74b29dddaa8f11ea3f2e2f
PASS: manual-expand.test 18 da39a3ee5e6b4b0d3255bfef95601890afd80709
make[3]: Leaving directory '/builddir/build/BUILD/lbzip2-b6dc48a7b9bfe6b340ed1f6d72133608ad57144b/tests'
<hang>

Version-Release number of selected component (if applicable):
2.5-16.20171011gitb6dc48a

Comment 1 Dan Horák 2020-08-21 10:39:49 UTC
Seems to be LTO runtime related, with %global _lto_cflags %{nil} the rpm is built.

Comment 2 Dan Horák 2020-08-21 10:42:02 UTC
And for the record, build fails early in the build-aux/autogen.sh call when nspawn isolation is used (I guess on all arches), works OK in chroot.

Comment 3 Dominik 'Rathann' Mierzejewski 2020-08-21 12:12:27 UTC
(In reply to Dan Horák from comment #1)
> Seems to be LTO runtime related, with %global _lto_cflags %{nil} the rpm is
> built.

Confirmed and opt-out applied for s390x only, thank you!

Cc'ing Jeff.

(In reply to Dan Horák from comment #2)
> And for the record, build fails early in the build-aux/autogen.sh call when
> nspawn isolation is used (I guess on all arches), works OK in chroot.

It doesn't in koji anymore. glibc-2.32.9000-4.fc34 fixes this.

By the way, could you update the fedora-gpg-keys packages on your lfedora2 s390x test VM? It's unable to build any rawhide packages in mock:
...
Public key for zstd-1.4.5-5.fc33.s390x.rpm is not installed. Failing package is: zstd-1.4.5-5.fc33.s390x
 GPG Keys are configured as: file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-33-primary, file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-33-primary, file:///usr/share/distribution-gpg-keys/fedora/RPM-GPG-KEY-fedora-32-primary
Error: GPG check FAILED

Comment 4 Dan Horák 2020-08-21 12:28:32 UTC
some details from the stuck test, I wonder if there could be a real deadlock hidden somewhere in the app and LTO exposes it

ps output:
 110995 ?        Ss     0:00 /usr/bin/rpmbuild -bb --target s390x --nodeps /builddir/build/SPECS/lbzip2.spec
 131144 ?        S      0:00 /bin/sh -e /var/tmp/rpm-tmp.F1FJ3s
 131145 ?        S      0:00 /usr/bin/make -O -j8 V=1 VERBOSE=1 check
 131146 ?        S      0:00 /bin/sh -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target
 131170 ?        S      0:00 /bin/sh -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target
 131171 ?        S      0:00 /usr/bin/make check
 131205 ?        S      0:00 /usr/bin/make check-TESTS check-local
 131211 ?        S      0:00 /bin/sh -c set +e; bases='manual-compress.log fuzz-collect.log fuzz-divbwt.log manual-expand.log'; bases=`for i in $bases; do echo $i; done | sed 's/\.log$//'`; 
 131220 ?        S      0:00 /usr/bin/make test-suite.log TEST_LOGS=manual-compress.log fuzz-collect.log fuzz-divbwt.log manual-expand.log
 131221 ?        S      0:00 /bin/sh -c p='manual-compress.test'; \ case 'manual-compress.log' in */*) case 'manual-compress' in */*) b='manual-compress';; *) b=`echo 'manual-compress.log' |
 131237 ?        S      0:00 /bin/sh ../build-aux/tap-driver.sh --test-name manual-compress.test --log-file manual-compress.log --trs-file manual-compress.trs --color-tests no --enable-hard-
 131241 ?        S      0:00 /bin/sh ../build-aux/tap-driver.sh --test-name manual-compress.test --log-file manual-compress.log --trs-file manual-compress.trs --color-tests no --enable-hard-
 131242 ?        S      0:00 gawk -v me=tap-driver.sh -v test_script_name=manual-compress.test -v log_file=manual-compress.log -v trs_file=manual-compress.trs -v expect_failure=0 -v merge=0 
 131243 ?        S      0:00 ./driver compress suite/manual-compress
 131811 ?        S      0:00 sh -c cat >&3
 131814 ?        S      0:00 cat
 131933 ?        Sl     0:00 lbzip2

[sharkcz@rock-kvmlp-fedora lbzip2]$ sudo strace -p 131933
[sudo] password for sharkcz: 
strace: Process 131933 attached
rt_sigsuspend([PIPE XFSZ], 8

^Cstrace: Process 131933 detached
 <detached ...>


after attaching gdb to the process

(gdb) thread apply all bt

Thread 10 (LWP 131944):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 9 (LWP 131943):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 8 (LWP 131942):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 7 (LWP 131941):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 6 (LWP 131940):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 5 (LWP 131939):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 4 (LWP 131938):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d12c70 in worker_thread_proc () at process.c:452
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 3 (LWP 131936):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d128ba in sink_thread_proc () at process.c:373
#2  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#4  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 2 (LWP 131935):
#0  0x000003ff8298fc88 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
#1  0x000002aa28d152c4 in worker_thread_proc () at process.c:427
#2  primary_thread () at process.c:529
#3  0x000002aa28d123ee in thread_entry (real_entry=<optimized out>) at process.c:82
#4  0x000003ff82989556 in start_thread () from target:/lib64/libpthread.so.0
#5  0x000003ff828a04f6 in thread_start () from target:/lib64/libc.so.6

Thread 1 (LWP 131933):
#0  0x000003ff827cae7a in sigsuspend () from target:/lib64/libc.so.6
#1  0x000002aa28d03458 in halt () at signals.c:233
#2  schedule (proc=0x2aa28d26a28 <compression>) at process.c:611
#3  work () at process.c:654
#4  main (argc=<optimized out>, argv=<optimized out>) at main.c:934

Comment 5 Jeff Law 2020-08-21 15:23:33 UTC
It could well be the case there's a concurrency issue that is pre-existing and possibly made worse by LTO -- particulary if there's shared state across the threads which is not properly marked.  Has anyone tried the thread sanitizer on this code?

Comment 6 Ben Cotton 2021-02-09 15:16:09 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.


Note You need to log in before you can comment on or make changes to this bug.