Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 786170 - gam_server deadlocks, leading to all KDE applications hanging at start
Summary: gam_server deadlocks, leading to all KDE applications hanging at start
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gamin
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tomáš Bžatek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-31 16:06 UTC by Sandro Mani
Modified: 2015-03-03 23:04 UTC (History)
4 users (show)

Fixed In Version: gamin-0.1.10-12.fc18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-08 16:28:32 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Backtrace (1.73 KB, text/plain)
2012-01-31 16:06 UTC, Sandro Mani
no flags Details
Patch (586 bytes, patch)
2012-02-02 22:14 UTC, Sandro Mani
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Debian BTS 542361 0 None None None 2012-06-29 20:37:49 UTC
GNOME Bugzilla 667230 0 None None None 2012-06-29 20:00:18 UTC
GNOME Bugzilla 669292 0 None None None Never
KDE Software Compilation 281312 0 None None None Never

Description Sandro Mani 2012-01-31 16:06:18 UTC
Created attachment 558641 [details]
Backtrace

Description of problem:
Occasionally, gam_server deadlocks (see attached backtrace). When this happens, all KDE applications hang at start. Killing gam_server allows the user to regain an operational desktop.


Version-Release number of selected component (if applicable):
gamin-0.1.10-11.fc17.x86_64

How reproducible:
Unfrequently

Steps to Reproduce:
Hard, it appears to be more likely to happen when attempting to launch a KDE application when the machine is out of RAM and swapping.
  
Actual results:
gam_server hangs, an so do all applications using it (i.e. attempt to connect to /tmp/fam-$username-)

Additional info:
Backtrace attached

Comment 1 Sandro Mani 2012-02-02 22:14:12 UTC
Looking at gamin-0.1.10/server/inotify-helper.c, there is an obvious double lock:
ih_sub_foreach_worker locks inotify_lock on line 153, and calls ih_sub_cancel on line 162, which also attempts to lock inotify_lock on line 130. The backtrace indeed shows that gam_server is stuck ad inotify-helper.c::130.

I would suggest removing the G_LOCK and G_UNLOCK in ih_sub_cancel, since it is only called by ih_sub_foreach_worker anyway. I have made a corresponding patch.

Comment 2 Sandro Mani 2012-02-02 22:14:56 UTC
Created attachment 559148 [details]
Patch

Comment 3 Sandro Mani 2012-02-02 22:16:23 UTC
Adding rdieter.edu to CC per http://lists.fedoraproject.org/pipermail/users/2011-December/411145.html

Comment 4 Tomáš Bžatek 2012-02-03 13:17:12 UTC
FYI, this piece of code is embedded in glib, should be fixed there too.
/me checks...

Comment 5 Tomáš Bžatek 2012-02-08 16:28:32 UTC
Confirming this doesn't affect glib, ih_sub_cancel() is not used in nested calls and there should not be any lock held. In fact, glib uses slightly newer sources from John McCutchan.

Posted my comments in upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=669292

Fixed in gamin-0.1.10-12.fc18

Comment 6 Rex Dieter 2012-06-29 20:37:49 UTC
marked a few more upstream references

Had someone in #fedora-kde irc today seemingly get hit by this already running gamin-0.1.10-12.fc17 , will keep an eye out and re-open if we can get more debugging done next time (instead of simply killing gam_server)

Comment 7 Gerd v. Egidy 2013-03-03 20:57:31 UTC
I hit this every few weeks, currently running gamin-0.1.10-12.fc17.x86_64. Today I had enough time to strace it, find that gam_server is the culprit and found this bugzilla entry.

What do you want me to do to further analyze this when I hit it next time?

Comment 8 Sandro Mani 2013-03-03 21:03:12 UTC
Do (as root)

debuginfo-install gamin

then run (as normal user)

gdb -p$(pidof gam_server)

and then type

backtrace

and attach the backtrace.

Comment 9 Sandro Mani 2013-03-03 21:04:12 UTC
Ups, should be

gdb -p $(pidof gam_server)

(note the space between p and $)

Comment 10 Gerd v. Egidy 2013-03-03 22:39:39 UTC
Thanks. So just a regular backtrace.

Now I remembered how to trigger this: open dolphin and navigate to a directory with many subdirs (about 600 in my case). Scroll a bit around in the view or enter a subdir. And it hits you within a minute or so. BTW, this dir is mounted via NFS4, if that matters.

Here is the backtrace:

#0  0x00007f506864c950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=9, data=data@entry=0x7fff62545660, len=len@entry=59) at gam_channel.c:826
#2  0x0000000000408794 in gam_send_event (conn=conn@entry=0x14dfb60, reqno=90, event=<optimized out>, path=0x15105b0 "###DIRNAME###", len=49)
    at gam_connection.c:609
#3  0x000000000040a36d in gam_eq_flush_callback (conn=0x14dfb60, event=0x1510810, eq=<optimized out>) at gam_eq.c:118
#4  gam_eq_flush (eq=0x14dd130, conn=conn@entry=0x14dfb60) at gam_eq.c:137
#5  0x00000000004081ea in gam_connection_eq_flush (data=0x14dfb60, data@entry=<error reading variable: value has been optimized out>) at gam_connection.c:174
#6  0x00007f50689673bb in g_timeout_dispatch (source=source@entry=0x1511a40, callback=<optimized out>, user_data=<optimized out>) at gmain.c:3882
#7  0x00007f5068966825 in g_main_dispatch (context=0x14dd4d0) at gmain.c:2539
#8  g_main_context_dispatch (context=context@entry=0x14dd4d0) at gmain.c:3075
#9  0x00007f5068966b58 in g_main_context_iterate (context=0x14dd4d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146
#10 0x00007f5068966f52 in g_main_loop_run (loop=0x14df160) at gmain.c:3340
#11 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647

I edited the entry ###DIRNAME###, there was the name of one of my directories in it. ###DIRNAME### was not the current dir I was in, but one up in hierarchy and then two below. Don't know what dolphin was telling gam to do there, I never entered that ###DIRNAME### or the one above in this whole KDE session.

Version is gamin-0.1.10-12.fc17.x86_64,
Signature   : RSA/SHA256, Wed 08 Feb 2012 05:46:09 PM CET, Key ID 50e94c991aca3465
Build Date  : Wed 08 Feb 2012 05:27:13 PM CET
Build Host  : x86-11.phx2.fedoraproject.org

Comment 11 Gerd v. Egidy 2013-03-03 22:50:01 UTC
Next one, tried it in another big dir hierarchy. Can reproduce it within minutes now.

#0  0x00007fb88cfd6950 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000408106 in gam_client_conn_write (source=<optimized out>, fd=20, data=data@entry=0x7fff0bf90350, len=len@entry=51) at gam_channel.c:826
#2  0x0000000000408b6d in gam_send_ack (conn=conn@entry=0xf3e9d0, reqno=1009, path=path@entry=0xff1230 "##OTHERDIR##", len=len@entry=41) at gam_connection.c:686
#3  0x0000000000409020 in gam_connection_request (req=0xf3ee02, conn=0xf3e9d0) at gam_connection.c:411
#4  gam_connection_data (conn=conn@entry=0xf3e9d0, len=<optimized out>) at gam_connection.c:509
#5  0x0000000000407adf in gam_client_conn_read (source=0xf8f980, condition=<optimized out>, info=0xf3e9d0) at gam_channel.c:283
#6  0x00007fb88d2f0825 in g_main_dispatch (context=0xf2f4d0) at gmain.c:2539
#7  g_main_context_dispatch (context=context@entry=0xf2f4d0) at gmain.c:3075
#8  0x00007fb88d2f0b58 in g_main_context_iterate (context=0xf2f4d0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3146
#9  0x00007fb88d2f0f52 in g_main_loop_run (loop=0xf31160) at gmain.c:3340
#10 0x00000000004035e6 in main (argc=<optimized out>, argv=<optimized out>) at gam_server.c:647

##OTHERDIR## again is a dir that I did not open. This time it is below the current one.

Comment 12 Sandro Mani 2013-03-03 22:54:34 UTC
This is not the deadlock reported here, so it's better to file a new bug. While I'm at it, when gam_server is hanging, who is the file descriptor fd=20? Look at 

ls -l /proc/$(pidof gam_server)/fd

Comment 13 Gerd v. Egidy 2013-03-04 22:29:30 UTC
Ok, I filed a new bug for this: #917848


Note You need to log in before you can comment on or make changes to this bug.