Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1412204
Summary: | rust network test suite stuck | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dan Horák <dan> |
Component: | rust | Assignee: | Rust SIG <rust-sig> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 25 | CC: | bugproxy, hannsj_uhl, jistone, TicoTimo |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | s390x | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-06-20 13:50:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 467765, 1410952 |
Description
Dan Horák
2017-01-11 14:17:41 UTC
potential real cause is in bug 1410052 I believe this is another codegen bug in LLVM 3.8. All of the rawhide s390x builds have been fine, so I think we can rule out kernel problems in the builder. Then I tried a test binary compiled in f25, running in a rawhide chroot, and that still hangs, so it doesn't seem to be fixed by newer glibc etc. Then I tried in f25 with the bundled LLVM (3.9-ish), and that has been working perfectly! As for blaming pthread_rwlock_tryrdlock, the gdb errors need to be resolved before you can trust the reported locations. I had similar errors as you trying to attach in the chroot, and found that the buildids that gdb was looking at were from the host versions of those libraries. (I'm not even sure how it's getting there from the chroot!) But it does work to launch the test binary directly from gdb, then I see: (gdb) info threads Id Target Id Frame * 1 Thread 0x3fffdd022a0 (LWP 52925) "stdtest-s390x-u" 0x000003fffdfb162e in __pthread_cond_timedwait (cond=0x2aa002c7060, mutex=0x2aa002c7010, abstime=0x3ffffffaeb8) at pthread_cond_timedwait.c:198 3 Thread 0x3fffdb00910 (LWP 52929) "net::tcp::tests" __pthread_cond_wait (cond=0x3fff0000a20, mutex=0x3fff00014b0) at pthread_cond_wait.c:189 17 Thread 0x3fffdd00910 (LWP 52943) "net::tcp::tests" __pthread_cond_wait (cond=0x3fff8000d90, mutex=0x3fff8001100) at pthread_cond_wait.c:189 Thread 1 is just the main thread spawning tests. (gdb) thread 3 [Switching to thread 3 (Thread 0x3fffdb00910 (LWP 52929))] #0 __pthread_cond_wait (cond=0x3fff0000a20, mutex=0x3fff00014b0) at pthread_cond_wait.c:189 189 __pthread_disable_asynccancel (cbuffer.oldtype); (gdb) backtrace #0 __pthread_cond_wait (cond=0x3fff0000a20, mutex=0x3fff00014b0) at pthread_cond_wait.c:189 #1 0x000002aa0016fc4a in std::sys::imp::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sys/unix/condvar.rs:64 #2 std::sys_common::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sys_common/condvar.rs:51 #3 std::sync::condvar::Condvar::wait<bool> (self=<optimized out>, guard=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/condvar.rs:125 #4 0x000002aa0007cfc2 in std::thread::park () at /builddir/build/BUILD/rustc-1.14.0/src/libstd/thread/mod.rs:466 #5 0x000002aa0014d212 in std::sync::mpsc::blocking::WaitToken::wait (self=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/mpsc/blocking.rs:81 #6 std::sync::mpsc::shared::Packet<(std::net::tcp::TcpStream, std::net::addr::SocketAddr)>::recv<(std::net::tcp::TcpStream, std::net::addr::SocketAddr)> (self=<optimized out>, deadline=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/mpsc/shared.rs:237 #7 std::sync::mpsc::Receiver<(std::net::tcp::TcpStream, std::net::addr::SocketAddr)>::recv<(std::net::tcp::TcpStream, std::net::addr::SocketAddr)> (self=<optimized out>) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/mpsc/mod.rs:883 #8 0x000002aa001c4708 in std::net::tcp::tests::clone_accept_concurrent::{{closure}} (addr=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/net/tcp.rs:996 #9 0x000002aa000eb55c in std::net::tcp::tests::each_ip (f=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/net/tcp.rs:444 #10 std::net::tcp::tests::clone_accept_concurrent () at /builddir/build/BUILD/rustc-1.14.0/src/libstd/net/tcp.rs:974 #11 0x000002aa001d6616 in test::run_test::{{closure}} () at /builddir/build/BUILD/rustc-1.14.0/src/libtest/lib.rs:1265 [...] (gdb) thread 17 [Switching to thread 17 (Thread 0x3fffdd00910 (LWP 52943))] #0 __pthread_cond_wait (cond=0x3fff8000d90, mutex=0x3fff8001100) at pthread_cond_wait.c:189 189 __pthread_disable_asynccancel (cbuffer.oldtype); (gdb) backtrace #0 __pthread_cond_wait (cond=0x3fff8000d90, mutex=0x3fff8001100) at pthread_cond_wait.c:189 #1 0x000002aa0016fc4a in std::sys::imp::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sys/unix/condvar.rs:64 #2 std::sys_common::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sys_common/condvar.rs:51 #3 std::sync::condvar::Condvar::wait<bool> (self=<optimized out>, guard=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/condvar.rs:125 #4 0x000002aa0007cfc2 in std::thread::park () at /builddir/build/BUILD/rustc-1.14.0/src/libstd/thread/mod.rs:466 #5 0x000002aa00135bd8 in std::sync::mpsc::blocking::WaitToken::wait (self=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/mpsc/blocking.rs:81 #6 std::sync::mpsc::shared::Packet<()>::recv<()> (self=<optimized out>, deadline=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/mpsc/shared.rs:237 #7 0x000002aa0014b9ae in std::sync::mpsc::Receiver<()>::recv<()> (self=<optimized out>) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/sync/mpsc/mod.rs:883 #8 0x000002aa001c280c in std::net::tcp::tests::clone_while_reading::{{closure}} (addr=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/net/tcp.rs:949 #9 0x000002aa000eb36c in std::net::tcp::tests::each_ip (f=...) at /builddir/build/BUILD/rustc-1.14.0/src/libstd/net/tcp.rs:444 #10 std::net::tcp::tests::clone_while_reading () at /builddir/build/BUILD/rustc-1.14.0/src/libstd/net/tcp.rs:916 #11 0x000002aa001d6616 in test::run_test::{{closure}} () at /builddir/build/BUILD/rustc-1.14.0/src/libtest/lib.rs:1265 In both cases, they are stuck in a mpsc channel recv[1] at the end of their respective tests, waiting for their sub-threads to exit. But we can see in GDB that there are no other threads, so those must have already exited and we missed their mpsc send! [1] https://doc.rust-lang.org/std/sync/mpsc/struct.Receiver.html#method.recv If I run just the sync::mpsc tests, it also gets stuck. I think we're only seeing it get stuck in net::tcp because those are alphabetically first and happen to also use mpsc. Rust runs tests in parallel up to the number of CPUs, so once we get a hung test for each CPU, we're hosed. TL;DR Rust's mpsc is misbehaving, but then it works with any LLVM 3.9, so it appears to be a codegen issue with 3.8. This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. resolved via bug 1410952 (llvm got updated to 3.9 in F-25) |