Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 532307
Summary: | [abrt] pulseaudio looks to be crashing empathy (ptrhead_setspecific() fails with EINVAL) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Karel Klíč <kklic> | ||||||||
Component: | empathy | Assignee: | Peter Gordon <peter> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 12 | CC: | akomano, aks03081991, ameya.gore, anwarbaik88, asiminski, bdpepple, bschueler, bugzilla.redhat, chache, charles.liu23, crzand, ddougher, digitalvectorz, gjalves, hafflys, ink08, intheshow2, jmarcth, johnbstroud, jplorier, ken, konmpar, korbe, kparnell, lebosse.nicolas, linux, linuxnow, livermob, lkundrak, lpoetter, luigi.cardeles, luis.medina, mbooth, mcepl, mcepl, mdhensley, menezgrunge88, mschmidt, msdeleonpeque, ms, mutuussentire, mz1550, n12367, pal666, peter, philip, rayne.sierra, rebelinux, redhat2, richard, richard.vrsnik, rod.c.johnson, rvokal, sarrab1976, sean.stangl, sebastien.willmann, sergey.linux, smartcheetahbr, stickster, tarek.ahmed.omar, thanosk, theophanis_kontogiannis, tomek, vdanielmo, vwfoxguru, wellspring3, wmello, wtogami, yantrikig, yn.abid, yulrottmann | ||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | abrt_hash:6d89414b430cfc21f4e09e554edaf108eb1fc1b4 | ||||||||||
Fixed In Version: | 2.28.2-2.fc12 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 554899 (view as bug list) | Environment: | |||||||||
Last Closed: | 2010-01-26 00:56:19 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 554899 | ||||||||||
Attachments: |
|
Description
Karel Klíč
2009-11-01 13:14:49 UTC
Created attachment 367009 [details]
File: backtrace
Again the same crash today. Looking at the backtrace it looks like this crash is caused by pulseaudio. Reassigning bug. Hmm, that's pthread_setspecific() failing. I don't see how that could ever fail, especially since we call pthread_getspecific() right before. Is there any reliable way to reproduce this? I'd be very interested in the exact return value if pthread_setspecific() there. It crashes about once a day. I'll try get the return value. *** Bug 533726 has been marked as a duplicate of this bug. *** *** Bug 533576 has been marked as a duplicate of this bug. *** *** Bug 533923 has been marked as a duplicate of this bug. *** Any luck so far? It seems this bug is triggered much less when Empathy runs within gdb. It crashed on Friday, but I failed to get the pthread_setspecific() return value (gdb crashed when I tried to reload debug infos). I still run Empathy with gdb. I'll try to add some debugging output to PulseAudio and recompile it. Then I can run it without gdb. When I download pulseaudio fedora CVS repository, run "make local" in F-12 and install the result (all rpms, or just the pulseaudio-0.9.19-2 rpm, or just libpulsecore-*.so), the pulseaudio daemon cannot start. Nov 11 21:03:33 localhost pulseaudio[2718]: fdsem.c: Assertion 'pa_atomic_dec(&f ->data->waiting) >= 1' failed at pulsecore/fdsem.c:283, function pa_fdsem_before _poll(). Aborting. Is there some other way to run the patched version? This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping *** Bug 537831 has been marked as a duplicate of this bug. *** *** Bug 538140 has been marked as a duplicate of this bug. *** Created attachment 369945 [details]
Another backtrace
*** Bug 539588 has been marked as a duplicate of this bug. *** *** Bug 539979 has been marked as a duplicate of this bug. *** *** Bug 540880 has been marked as a duplicate of this bug. *** *** Bug 541066 has been marked as a duplicate of this bug. *** *** Bug 541403 has been marked as a duplicate of this bug. *** *** Bug 541498 has been marked as a duplicate of this bug. *** *** Bug 541945 has been marked as a duplicate of this bug. *** *** Bug 542448 has been marked as a duplicate of this bug. *** *** Bug 543089 has been marked as a duplicate of this bug. *** *** Bug 543881 has been marked as a duplicate of this bug. *** *** Bug 544015 has been marked as a duplicate of this bug. *** *** Bug 544101 has been marked as a duplicate of this bug. *** *** Bug 544457 has been marked as a duplicate of this bug. *** *** Bug 544462 has been marked as a duplicate of this bug. *** *** Bug 544757 has been marked as a duplicate of this bug. *** *** Bug 544846 has been marked as a duplicate of this bug. *** *** Bug 545240 has been marked as a duplicate of this bug. *** *** Bug 545421 has been marked as a duplicate of this bug. *** *** Bug 545455 has been marked as a duplicate of this bug. *** *** Bug 545616 has been marked as a duplicate of this bug. *** *** Bug 545885 has been marked as a duplicate of this bug. *** *** Bug 546076 has been marked as a duplicate of this bug. *** *** Bug 546112 has been marked as a duplicate of this bug. *** *** Bug 546335 has been marked as a duplicate of this bug. *** (In reply to comment #9) > Any luck so far? Lennart, I'm also getting this crash daily, and by the looks of it, so are a bunch of other people. I've just installed a patched version of pulseaudio which should give the return value of pthread_setspecific when it inevitably crashes tomorrow. However, I'm going to go out on a limb and say that from the 2 possibilities (ENOMEM and EINVAL), it's going to be EINVAL. Looking at the code in thread.h, the most obvious reason for this would be use of the thread local object after its destructor had been called. Not being familiar at all with this codebase myself, does that sound right? What additional debug info would you want? Matt Created attachment 377717 [details]
Patch to collect requested debug information when pthread_setspecific() fails
Right on cue, it crashed again this morning. This time I'd applied the patch from Comment #41 and rebuild pulseaudio. As expected, output was: 22: Invalid argument *** Bug 546926 has been marked as a duplicate of this bug. *** *** Bug 547038 has been marked as a duplicate of this bug. *** *** Bug 538474 has been marked as a duplicate of this bug. *** *** Bug 546857 has been marked as a duplicate of this bug. *** *** Bug 539726 has been marked as a duplicate of this bug. *** *** Bug 533435 has been marked as a duplicate of this bug. *** *** Bug 547452 has been marked as a duplicate of this bug. *** *** Bug 539838 has been marked as a duplicate of this bug. *** *** Bug 539854 has been marked as a duplicate of this bug. *** *** Bug 544621 has been marked as a duplicate of this bug. *** *** Bug 543988 has been marked as a duplicate of this bug. *** *** Bug 532484 has been marked as a duplicate of this bug. *** *** Bug 547537 has been marked as a duplicate of this bug. *** *** Bug 547841 has been marked as a duplicate of this bug. *** *** Bug 547885 has been marked as a duplicate of this bug. *** *** Bug 547995 has been marked as a duplicate of this bug. *** (In reply to comment #40) > (In reply to comment #9) > > Any luck so far? > > Lennart, > > I'm also getting this crash daily, and by the looks of it, so are a bunch of > other people. > > I've just installed a patched version of pulseaudio which should give the > return value of pthread_setspecific when it inevitably crashes tomorrow. > However, I'm going to go out on a limb and say that from the 2 possibilities > (ENOMEM and EINVAL), it's going to be EINVAL. Looking at the code in thread.h, > the most obvious reason for this would be use of the thread local object after > its destructor had been called. That is unlikely. We actually build the library with -z nodelete precisely to avoid issues like that. Thanks for figuring out that EINVAL is the error cause, unfortunately this still is not precise enough to figure out fully what is going on here... *** Bug 548631 has been marked as a duplicate of this bug. *** *** Bug 548931 has been marked as a duplicate of this bug. *** *** Bug 548937 has been marked as a duplicate of this bug. *** *** Bug 549117 has been marked as a duplicate of this bug. *** *** Bug 549131 has been marked as a duplicate of this bug. *** *** Bug 549203 has been marked as a duplicate of this bug. *** *** Bug 549214 has been marked as a duplicate of this bug. *** *** Bug 549215 has been marked as a duplicate of this bug. *** *** Bug 549591 has been marked as a duplicate of this bug. *** *** Bug 549712 has been marked as a duplicate of this bug. *** *** Bug 549693 has been marked as a duplicate of this bug. *** *** Bug 549950 has been marked as a duplicate of this bug. *** *** Bug 550125 has been marked as a duplicate of this bug. *** I found something that may be of interest. I have had Abiword crash and the traceback always referred to Pulse Audio. My bug report got marked as a duplicate of this bug, so I am providing input here. On a whim, I selected Preferences/Sound. On the Sound Effects tab, I had previously selected the checkbox for Enable window and button sounds. I decided to uncheck the box and try the same operation (paste text into Abiword, then select it and attempt to resize it) that would previously cause Abiword to abend immediately. This time, however, Abiword completed the operations without fault. To summarize: Sound Preferences "Enable window and button sounds enabled--Abiword crashes Sound Preferences "Enable window and button sounds disabled--Abiword works. *** Bug 550518 has been marked as a duplicate of this bug. *** *** Bug 551036 has been marked as a duplicate of this bug. *** *** Bug 551041 has been marked as a duplicate of this bug. *** *** Bug 551281 has been marked as a duplicate of this bug. *** *** Bug 551298 has been marked as a duplicate of this bug. *** *** Bug 551633 has been marked as a duplicate of this bug. *** *** Bug 551659 has been marked as a duplicate of this bug. *** *** Bug 551663 has been marked as a duplicate of this bug. *** *** Bug 551662 has been marked as a duplicate of this bug. *** *** Bug 551745 has been marked as a duplicate of this bug. *** *** Bug 551799 has been marked as a duplicate of this bug. *** *** Bug 541398 has been marked as a duplicate of this bug. *** *** Bug 543630 has been marked as a duplicate of this bug. *** *** Bug 544365 has been marked as a duplicate of this bug. *** *** Bug 544690 has been marked as a duplicate of this bug. *** *** Bug 550365 has been marked as a duplicate of this bug. *** *** Bug 550490 has been marked as a duplicate of this bug. *** *** Bug 552064 has been marked as a duplicate of this bug. *** *** Bug 552116 has been marked as a duplicate of this bug. *** I can confirm Stephen Haffly's steps to reproduce and make them more specific. This is 100% reproducible for me: 1. Make sure you have "Enable window and button sounds" enabled in gnome-volume-control. 2. Run "abiword". 3. Paste a text into Abiword from Firefox or OpenOffice.org Writer. (Use one of these two applications in order to have the text copied as rich text with formatting. Pasting simple text from gedit or gnome-terminal won't reproduce the bug. It does not matter whether you use select and middle-click or CTRL+C, CTRL+V.) 4. Now almost any action (resize text, clicking in menus, ...) in Abiword will crash it with: Assertion 'pthread_setspecific(t->key, userdata) == 0' failed at pulsecore/thread-posix.c:200, function pa_tls_set(). Aborting. *** Bug 552369 has been marked as a duplicate of this bug. *** *** Bug 552413 has been marked as a duplicate of this bug. *** *** Bug 552544 has been marked as a duplicate of this bug. *** *** Bug 552597 has been marked as a duplicate of this bug. *** *** Bug 552740 has been marked as a duplicate of this bug. *** *** Bug 552927 has been marked as a duplicate of this bug. *** *** Bug 553025 has been marked as a duplicate of this bug. *** *** Bug 553095 has been marked as a duplicate of this bug. *** *** Bug 553183 has been marked as a duplicate of this bug. *** *** Bug 553235 has been marked as a duplicate of this bug. *** *** Bug 553362 has been marked as a duplicate of this bug. *** *** Bug 544457 has been marked as a duplicate of this bug. *** Oh man this is so stupid. I just dropped the majority of the duplicates from this bug again because they have NOTHING to do with PA. Guys, this is not a dumpster for your bugs you don't have any use for anymore. Please, from now on this bug should be only about PA related crashes, more specifically about pthread_setspecific() failing in pa_tls_set(), nothing else. If you get an abort() in the pa_tls_set() stack frame this is where to duplicate it to, but please, don't dup any other bugs on this, I have a hard time reading through all the noise here. Thanks. *** Bug 545370 has been marked as a duplicate of this bug. *** *** Bug 546820 has been marked as a duplicate of this bug. *** Hmm, I have not been able to reproduce this unfortunately. Not sure where to begin debugging. The hints in #93 did not cause this issue to be hit for me. Michal, is that on 32bit or 64bit? Anyone else has a good idea how I could reproduce this issue? (In reply to comment #109) > Hmm, I have not been able to reproduce this unfortunately. Not sure where to > begin debugging. The hints in #93 did not cause this issue to be hit for me. > Michal, is that on 32bit or 64bit? I'm using x86_64. F12 with updates-testing enabled. I added a few debug prints in src/pulsecore/thread-posix.c to debug PA's TLS usage. When running the steps to reproduce using abiword, the results was this: pa_tls_new, pthread=0x7fb5dde4d710, tid=3045: created key 4 pa_tls_set, pthread=0x7fb5dde4d710, tid=3045: replacing value for key 4. previous=(nil) new=0x1d29fb0 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's (nil) pa_tls_set, pthread=0x7fb5f33867c0, tid=3044: replacing value for key 4. previous=(nil) new=0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's 0x1d36c80 ### this is when I pasted some text from OOo ### pa_tls_get, pthread=0x7fb5f33867c0, tid=3044: got value for key 4, it's (nil) pa_tls_set, pthread=0x7fb5f33867c0, tid=3044: replacing value for key 4. previous=(nil) new=0x1d463f0 Assertion 'pthread_setspecific(t->key, userdata) == 0' failed at pulsecore/thread-posix.c:216, function pa_tls_set(). Aborting. Notice how the value for key 4 got erased suddenly without any pa_tls_*() calls in between the two consecutive calls to pa_tls_get(). This tells me that something else than PA fiddles with thread-specific data. A possible explanation could be that something called pthread_key_delete() in between and destroyed key 4. So I ran abiword under gdb, placing breakpoints at pthread_key_create() and pthread_key_delete(). And really, this revealed about 80 calls to pthread_key_delete(), all with a backtrace like this: Breakpoint 2, pthread_key_delete (key=4) at pthread_key_delete.c:31 31 if (__builtin_expect (key < PTHREAD_KEYS_MAX, 1)) #0 pthread_key_delete (key=4) at pthread_key_delete.c:31 #1 0x0000003d24038085 in xmlCleanupParser__internal_alias () at parser.c:14044 #2 0x0000003d2363da57 in UT_XML::~UT_XML() () from /usr/lib64/libabiword-2.8.so #3 0x0000003d2363de0a in UT_XML_Decode(char const*) () from /usr/lib64/libabiword-2.8.so #4 0x0000003d235370b7 in AP_Prefs::loadBuiltinPrefs() () from /usr/lib64/libabiword-2.8.so #5 0x0000003d23537172 in AP_Prefs::fullInit() () from /usr/lib64/libabiword-2.8.so #6 0x0000003d23497ae7 in AP_UnixApp::initialize(bool) () from /usr/lib64/libabiword-2.8.so #7 0x0000003d23498169 in AP_UnixApp::main(char const*, int, char**) () from /usr/lib64/libabiword-2.8.so #8 0x0000003d1841eb1d in __libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=<value optimized out>) at libc-start.c:226 #9 0x0000000000400889 in _start () Deleting an already deleted key is clearly a bug. I believe this specific case is a bug in abiword - it should not call libxml2's xmlCleanupParser() unless it's going to exit really soon (a source comment in libxml2 has a big WARNING about it). What does anything from what I wrote to do with empathy? I don't know. Maybe empathy misuses libxml2 in a similar way. I don't use empathy. And right now I'm on an extremely crippled Internet connection to even download it. Ah, wonderful. That could be it. Empathy in fact *does* call that function quite often, judging by the code: http://git.gnome.org/browse/gossip/tree/src/gossip-contact-groups.c#n224 Will reassign to empathy again. Hmm, and a google code search kinda suggests that everyone and his dog is calling that function where he shouldn't. I have also duplicated this now for abiword as bug 554899. (In reply to comment #113) > Hmm, and a google code search kinda suggests that everyone and his dog is > calling that function where he shouldn't. I will now blog about this, in a attempt to make people aware of that misuse. Here's the fix: http://git.collabora.co.uk/?p=user/cassidy/empathy;a=commitdiff;h=ae0043914458e13bc2fdb8cecceeaf645153f35b Michal and Lennart -- thank you both for running this to ground. Another pernicious problem of abusing PulseAudio can now be solved! :-) (And apologies for not having anything substantive to add to this bug other than a thank-you. I'll help publicize Lennart's blog entry though.) empathy-2.28.2-2.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/empathy-2.28.2-2.fc12 empathy-2.28.2-2.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update empathy'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-0581 *** Bug 532106 has been marked as a duplicate of this bug. *** *** Bug 549392 has been marked as a duplicate of this bug. *** *** Bug 549361 has been marked as a duplicate of this bug. *** *** Bug 544814 has been marked as a duplicate of this bug. *** *** Bug 524506 has been marked as a duplicate of this bug. *** empathy-2.28.2-2.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report. *** Bug 561059 has been marked as a duplicate of this bug. *** |