Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 548989
Summary: | PI mutexes are broken (again) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bruno Wolff III <bruno> | ||||||||
Component: | glibc | Assignee: | Andreas Schwab <schwab> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | CC: | anton, ascii79, awilliam, davidsen, dodji, dougsland, erik-fedora, gansalmon, geoff+fedora, itamar, jakub, kasal, kernel-maint, lkundrak, lpoetter, M8R-qx9aop, mschmidt, n12367, pascal, paul, phuang, redhat, schnell, schwab, scorporat, tomek, wtogami | ||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | glibc-2.11.90-10 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-01-21 09:46:55 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 538274 | ||||||||||
Attachments: |
|
Description
Bruno Wolff III
2009-12-19 20:58:05 UTC
I am pretty sure this is again the fault of the PI mutex code in the kernel. Will verify. Yes, verified. Either the kernel or glibc are at fault here. Reassigning. (RT folks, why do you break this every second month? Broken PI mutex unlocking is a recurring story...) *** Bug 538638 has been marked as a duplicate of this bug. *** *** Bug 548259 has been marked as a duplicate of this bug. *** *** Bug 541420 has been marked as a duplicate of this bug. *** *** Bug 541359 has been marked as a duplicate of this bug. *** *** Bug 549134 has been marked as a duplicate of this bug. *** (In reply to comment #7) > *** Bug 549134 has been marked as a duplicate of this bug. *** That one has a different backtrace. The assertion failure was in pa_pstream_send_tagstruct_with_creds, not in pa_mutex_unlock like it was in all the others. Not sure if it is really duplicate. And all the other duplicates are from i686. (In reply to comment #2) > Yes, verified. Either the kernel or glibc are at fault here. Lennart, how do you verify it? Do you have a simple test case? (In reply to comment #8) > (In reply to comment #7) > > *** Bug 549134 has been marked as a duplicate of this bug. *** > > That one has a different backtrace. The assertion failure was in > pa_pstream_send_tagstruct_with_creds, not in pa_mutex_unlock like it was in all > the others. Not sure if it is really duplicate. > > And all the other duplicates are from i686. I looked at these since my report was classified as a duplicate as well, and I'm uncertain if this is just some damage in another place from a common problem, or if all metacity bugs were classified as duplicates. On initial reading I'm with you, I don't think 549134 is the same thing. I do think that what we are all seeing in the i686 is more common and should be addresses first, then 549134 can be retested. (In reply to comment #9) > (In reply to comment #2) > > Yes, verified. Either the kernel or glibc are at fault here. > > Lennart, > how do you verify it? Do you have a simple test case? In the PA dev tree is a relatively simple example which I used, but its not exactly trivial to compile because the dev tree pulls in quite a few dependencies. (In reply to comment #8) > (In reply to comment #7) > > *** Bug 549134 has been marked as a duplicate of this bug. *** > > That one has a different backtrace. The assertion failure was in > pa_pstream_send_tagstruct_with_creds, not in pa_mutex_unlock like it was in all > the others. Not sure if it is really duplicate. Oops. Goot catch. I have split this up again now. Got confused by the bt since it also included an _unlock() call... *** Bug 549247 has been marked as a duplicate of this bug. *** bug 549247 suggests glibc needs fixing, not the kernel. Tentatively reassigning. I also tried going back to an older glibc.Downgrading to glibc-2.11.90-3.i686 (same for other packages from that src rpm) got sound working again. Created attachment 379689 [details]
simple testcase
I was able to reproduce in an i686 KVM guest with F12 + glibc-2.11.90-4 from Rawhide. Here's a minimal testcase, loosely based on thread-mainloop-test.c from Pulseaudio.
This should block beta as basic sound functionality is a beta criterion. *** Bug 514060 has been marked as a duplicate of this bug. *** Created attachment 380966 [details] possible fix glibc-2.11.90-4 introduced requeue-PI support on i386 (x86_64 already had it). It seems the problem is in the new code. This is the upstream commit: http://sourceware.org/git/?p=glibc.git;a=commit;h=75956694f3f80a1c32389c95069641f52c236c8b I reviewed it and I believe I found the bug in pthread_cond_wait. I am attaching a possible fix. So far it's completely untested, I'm building glibc with it now. Created attachment 381009 [details]
Fix pthread_cond_wait with requeue-PI on i386
The previous patch caused a segfault. Here's a new one. It works for me. It fixes pthread_cond_timedwait too, though I tested only pthread_cond_wait so far.
Scratch build of glibc with the patch in Koji: http://kojipkgs.fedoraproject.org/scratch/michich/task_1896278/ Looks like I broke something else though. From the build.log: tst-robustpi8: pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed. Didn't expect signal from child: got `Aborted' *** Bug 552512 has been marked as a duplicate of this bug. *** *** Bug 552553 has been marked as a duplicate of this bug. *** *** Bug 552590 has been marked as a duplicate of this bug. *** *** Bug 552595 has been marked as a duplicate of this bug. *** (In reply to comment #21) > Scratch build of glibc with the patch in Koji: > http://kojipkgs.fedoraproject.org/scratch/michich/task_1896278/ > > Looks like I broke something else though. From the build.log: > tst-robustpi8: pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion > `(-(e)) != 3 || !robust' failed. > Didn't expect signal from child: got `Aborted' I could neither reproduce this testsuite failure locally nor did it occur in another try in Koji. Dinakar Guniguntala (the author of the requeue-PI patch for i386) found nothing wrong with my fix. Did anyone test the Koji build? Did it fix the bug? Were there any problems? It seems to have expired already, but here's the repeated one: http://kojipkgs.fedoraproject.org/scratch/michich/task_1904693/ Just tested the latest build, and at least youtube videos work fine in firefox again. glibc-devel-2.11.90-4.m3.i686 glibc-common-2.11.90-4.m3.i686 glibc-debuginfo-2.11.90-4.i686 glibc-headers-2.11.90-4.m3.i686 glibc-2.11.90-4.m3.i686 *** Bug 553756 has been marked as a duplicate of this bug. *** Recompliled glibc-2.11.90-4 with last patch - work fine for me. (In reply to comment #26) > Did anyone test the Koji build? Did it fix the bug? Were there any problems? > It seems to have expired already, but here's the repeated one: > http://kojipkgs.fedoraproject.org/scratch/michich/task_1904693/ I witness that this build of glibc* fixes the problem for me. (In reply to comment #30) > > Did anyone test the Koji build? Did it fix the bug? Were there any problems? > > It seems to have expired already, but here's the repeated one: > > http://kojipkgs.fedoraproject.org/scratch/michich/task_1904693/ > > I witness that this build of glibc* fixes the problem for me. Yes, it fixed it for me too. I am currently running Rawhide with those RPM on i686 just fine. *** Bug 552544 has been marked as a duplicate of this bug. *** *** Bug 555262 has been marked as a duplicate of this bug. *** The patch is now applied upstream: http://sourceware.org/git/?p=glibc.git;a=commit;h=893549c5a06956d2559391a3ffdeb6ded53b65c0 |