Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 71419
Summary: | nautilus getting confused and not starting or not activating existing copies | ||
---|---|---|---|
Product: | [Retired] Red Hat Public Beta | Reporter: | Chris Runge <crunge> |
Component: | nautilus | Assignee: | Havoc Pennington <hp> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jay Turner <jturner> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | null | CC: | alexl, srevivo, twaugh, wg |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2002-11-09 23:35:53 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 67217 | ||
Attachments: |
Description
Chris Runge
2002-08-13 12:37:14 UTC
Can you give this a try with bonobo-activation 1.0.3 and gnome-session 2.0.5 whenever those wander out to rawhide? gnome-session is somehow starting apps multiple times, and in the end not starting them at all, it's very strange... Need to find all the dups of this. Ah, the default session file was all screwed up. Try gnome-session 2.0.5-2 I thought this was fixed, but I'm still having problems, now in Null-re0816.3 $ rpm -q gnome-session gnome-session-2.0.5-3 $ rpm -q nautilus nautilus-2.0.4-3 I am in runlevel 3. The system crashed due to what I believe is an SMP lockup. (not sure if this matters at all or not, but I thought I'd mention it). I reboot the system and startx. Nautilus does not come up. ps -aex shows this: 1041 tty1 Z 0:00 [gnome-session <defunct>] 1042 ? R 0:32 nautilus --no-default-window --sm-client-id default3 I then exit gnome/X at the console I checked and nautilus process 1042 is still there I then started X again 1042 still there and a new one started: 1042 ? R 2:55 nautilus --no-default-window --sm-client-id default3 1204 ? S 0:00 nautilus --no-default-window --sm-client-id default3 but Nautilus still really hasn't started (no icons on the desktop) I then exited X again to the console and issued the command killall nautilus and verified all of the nautilus processes were killed I issues a startx and nautilus was working again Can you attach ~/.gnome2/session please? ~/.gnome2/session doesn't exist: $ ls -al ~/.gnome2 total 24 drwx------ 5 crunge crunge 4096 Aug 17 02:39 . drwx------ 16 crunge crunge 4096 Aug 17 07:01 .. drwx------ 2 crunge crunge 4096 Aug 16 20:34 accels -rw-rw-r-- 1 crunge crunge 57 Aug 16 21:28 memprof drwx------ 3 crunge crunge 4096 Aug 16 21:36 panel2.d drwxr-xr-x 4 crunge crunge 4096 Aug 16 20:34 share OK, I've seen this too. It doesn't feel like a gnome-session issue to me; I think it's nautilus or bonobo-activation causing multiple nautilus to start (and none of them end up managing the desktop). it may just be nautilus crashing or exiting at a strategic point. perhaps a recent patch introduced that. Do you ever get a core file from nautilus? coredumpsize=0, but I haven't seen a crash dialog... I've seen this too. Has anyone managed to figure out where nautilus is busy-looping (when it busy-loops)? Haven't investigated too much yet. Hard to reproduce. :-/ probably involves logging out and back in though. I'm guessing one of the relatively recent nautilus changes broke it, I think I started seeing it in the last few weeks, so just reviewing those may be a start. I discovered that on the system showing this, "gconftool-2 --get /apps/nautilus/preferences/add_to_session" printed "false" How this happened I don't know. But it would be good for other people to check whether it happens. If it seems to be happening a lot we should maybe hardcode that value as it's just a debug setting and an env variable would be more useful for debugging anyhow. 2.0.5-2 will contain a hack to remove the add_to_session setting and always add to session. Also, do you have nautilus set to not render the desktop? Whenever I've seen this, nautilus has been busy-looping, and when I kill it it starts as normal. So perhaps we aren't seeing the same thing after all. I have nautilus rendering the desktop, yes. Is the value of gconftool-2 --get /apps/nautilus/preferences/add_to_session "true" for you? Maybe check that when it sticks, just to be sure. Conceivably also this is just a side-effect of some random memory corruption also causing #72236 ... It's 'true' now, yes, and I haven't ever fiddled with that (nautilus rendering desktop). I'll check it if I see the problem again. 72515 is a dup, suggests the cause is the machine crashing (abnormal nautilus exit) Could it be the starthere-hackaround patch? I don't see how. We should remove that patch anyway, though. still having the problem under Null my problem sounds like twaugh's rather than the other issues--the gconftool-2 query returns true also the problem starts after the system locks up (71738)--so it may indeed be the fact that it is caused by an improper exit I looked in the changelog a month back or so, and i didn't see anything that affects startup. If someone can get this to happen under "strace -o output -f nautilus" that might be useful info. Also of course a backtrace (even without full symbols) from the stuck nautilus could help. Just happened again. This is what gdb says: #0 0x4206fde1 in malloc_consolidate () from /lib/i686/libc.so.6 #1 0x00000038 in ?? () Cannot access memory at address 0x38 Strace shows no output, and neither does ltrace. 'gconftool-2 --get /apps/nautilus/preferences/add_to_session' says 'true'. nautilus-2.0.5-3 I'll try to keep running for a while if you like, so you can let me know what other things I can try. Do we happen to have a pre-built debug package for this yet? How does the other threads look? That's the only one running. Are you sure? (milan ps and top hide threads) Oh, right, forgot that. (gdb) info threads 10 Thread 8201 (LWP 24108) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 9 Thread 7176 (LWP 24107) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 8 Thread 6151 (LWP 24106) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 7 Thread 5126 (LWP 24105) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 6 Thread 4101 (LWP 24104) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 5 Thread 3076 (LWP 24103) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 4 Thread 2051 (LWP 24102) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 3 Thread 1026 (LWP 24101) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 2 Thread 2049 (LWP 24100) 0x420c3a2b in poll () from /lib/i686/libc.so.6 * 1 Thread 1024 (LWP 24092) 0x4206fdb5 in malloc_consolidate () from /lib/i686/libc.so.6 (gdb) bt #0 0x4206fdb5 in malloc_consolidate () from /lib/i686/libc.so.6 #1 0x00000038 in ?? () Cannot access memory at address 0x38 (gdb) thread 2 [Switching to thread 2 (Thread 2049 (LWP 24100))]#0 0x420c3a2b in poll () from /lib/i686/libc.so.6 (gdb) bt #0 0x420c3a2b in poll () from /lib/i686/libc.so.6 #1 0x408eacce in __pthread_manager () from /lib/i686/libpthread.so.0 (gdb) thread 3 [Switching to thread 3 (Thread 1026 (LWP 24101))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408ef330 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x408ebe97 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x42070630 in free () from /lib/i686/libc.so.6 #5 0x4205e1fc in fclose@@GLIBC_2.1 () from /lib/i686/libc.so.6 #6 0x40c62210 in read_saved_cached_trash_entries () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #7 0x40c62398 in find_cached_trash_entry_for_device () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #8 0x40c62523 in find_trash_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #9 0x40c62834 in do_find_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #10 0x407446dc in gnome_vfs_find_directory_cancellable () from /usr/lib/libgnomevfs-2.so.0 #11 0x4074c408 in execute_find_directory () from /usr/lib/libgnomevfs-2.so.0 #12 0x4074c941 in gnome_vfs_job_execute () from /usr/lib/libgnomevfs-2.so.0 #13 0x4074a269 in thread_routine () from /usr/lib/libgnomevfs-2.so.0 #14 0x4075b44e in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #15 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #16 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 (gdb) (gdb) thread 4 [Switching to thread 4 (Thread 2051 (LWP 24102))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408e9f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0 #3 0x4075b3e6 in gnome_vfs_thread_pool_wait_for_work () from /usr/lib/libgnomevfs-2.so.0 #4 0x4075b43f in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #5 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #6 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 (gdb) thread 5 [Switching to thread 5 (Thread 3076 (LWP 24103))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408ef330 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x408ebe97 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x42070630 in free () from /lib/i686/libc.so.6 #5 0x4205e1fc in fclose@@GLIBC_2.1 () from /lib/i686/libc.so.6 #6 0x40c62210 in read_saved_cached_trash_entries () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #7 0x40c62398 in find_cached_trash_entry_for_device () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #8 0x40c62523 in find_trash_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #9 0x40c62834 in do_find_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #10 0x407446dc in gnome_vfs_find_directory_cancellable () from /usr/lib/libgnomevfs-2.so.0 #11 0x4074c408 in execute_find_directory () from /usr/lib/libgnomevfs-2.so.0 #12 0x4074c941 in gnome_vfs_job_execute () from /usr/lib/libgnomevfs-2.so.0 #13 0x4074a269 in thread_routine () from /usr/lib/libgnomevfs-2.so.0 #14 0x4075b44e in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #15 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #16 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 (gdb) (gdb) thread 6 [Switching to thread 6 (Thread 4101 (LWP 24104))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408ef330 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x408ebe97 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x42070630 in free () from /lib/i686/libc.so.6 #5 0x4205e1fc in fclose@@GLIBC_2.1 () from /lib/i686/libc.so.6 #6 0x40c62210 in read_saved_cached_trash_entries () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #7 0x40c62398 in find_cached_trash_entry_for_device () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #8 0x40c62523 in find_trash_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #9 0x40c62834 in do_find_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #10 0x407446dc in gnome_vfs_find_directory_cancellable () from /usr/lib/libgnomevfs-2.so.0 #11 0x4074c408 in execute_find_directory () from /usr/lib/libgnomevfs-2.so.0 #12 0x4074c941 in gnome_vfs_job_execute () from /usr/lib/libgnomevfs-2.so.0 #13 0x4074a269 in thread_routine () from /usr/lib/libgnomevfs-2.so.0 #14 0x4075b44e in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #15 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #16 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 (gdb) thread 7 [Switching to thread 7 (Thread 5126 (LWP 24105))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408ef330 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x408ebe97 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x42070630 in free () from /lib/i686/libc.so.6 #5 0x4205e1fc in fclose@@GLIBC_2.1 () from /lib/i686/libc.so.6 #6 0x40c62210 in read_saved_cached_trash_entries () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #7 0x40c62398 in find_cached_trash_entry_for_device () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #8 0x40c62523 in find_trash_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #9 0x40c62834 in do_find_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #10 0x407446dc in gnome_vfs_find_directory_cancellable () from /usr/lib/libgnomevfs-2.so.0 #11 0x4074c408 in execute_find_directory () from /usr/lib/libgnomevfs-2.so.0 #12 0x4074c941 in gnome_vfs_job_execute () from /usr/lib/libgnomevfs-2.so.0 #13 0x4074a269 in thread_routine () from /usr/lib/libgnomevfs-2.so.0 #14 0x4075b44e in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #15 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #16 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 (gdb) thread 8 [Switching to thread 8 (Thread 6151 (LWP 24106))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408ef330 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x408ebe97 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x42070630 in free () from /lib/i686/libc.so.6 #5 0x4205e1fc in fclose@@GLIBC_2.1 () from /lib/i686/libc.so.6 #6 0x40c62210 in read_saved_cached_trash_entries () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #7 0x40c62398 in find_cached_trash_entry_for_device () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #8 0x40c62523 in find_trash_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #9 0x40c62834 in do_find_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #10 0x407446dc in gnome_vfs_find_directory_cancellable () from /usr/lib/libgnomevfs-2.so.0 #11 0x4074c408 in execute_find_directory () from /usr/lib/libgnomevfs-2.so.0 #12 0x4074c941 in gnome_vfs_job_execute () from /usr/lib/libgnomevfs-2.so.0 #13 0x4074a269 in thread_routine () from /usr/lib/libgnomevfs-2.so.0 #14 0x4075b44e in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #15 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #16 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 (gdb) thread 9 [Switching to thread 9 (Thread 7176 (LWP 24107))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408ef330 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x408ebe97 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x42070630 in free () from /lib/i686/libc.so.6 #5 0x4205e1fc in fclose@@GLIBC_2.1 () from /lib/i686/libc.so.6 #6 0x40c62210 in read_saved_cached_trash_entries () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #7 0x40c62398 in find_cached_trash_entry_for_device () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #8 0x40c62523 in find_trash_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #9 0x40c62834 in do_find_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #10 0x407446dc in gnome_vfs_find_directory_cancellable () from /usr/lib/libgnomevfs-2.so.0 #11 0x4074c408 in execute_find_directory () from /usr/lib/libgnomevfs-2.so.0 #12 0x4074c941 in gnome_vfs_job_execute () from /usr/lib/libgnomevfs-2.so.0 #13 0x4074a269 in thread_routine () from /usr/lib/libgnomevfs-2.so.0 #14 0x4075b44e in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #15 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #16 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 (gdb) thread 10 [Switching to thread 10 (Thread 8201 (LWP 24108))]#0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 (gdb) bt #0 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 #1 0x408ecfe8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x408ef330 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x408ebe97 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x42070630 in free () from /lib/i686/libc.so.6 #5 0x4205e1fc in fclose@@GLIBC_2.1 () from /lib/i686/libc.so.6 #6 0x40c62210 in read_saved_cached_trash_entries () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #7 0x40c62398 in find_cached_trash_entry_for_device () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #8 0x40c62523 in find_trash_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #9 0x40c62834 in do_find_directory () from /usr/lib/gnome-vfs-2.0/modules/libfile.so #10 0x407446dc in gnome_vfs_find_directory_cancellable () from /usr/lib/libgnomevfs-2.so.0 #11 0x4074c408 in execute_find_directory () from /usr/lib/libgnomevfs-2.so.0 #12 0x4074c941 in gnome_vfs_job_execute () from /usr/lib/libgnomevfs-2.so.0 #13 0x4074a269 in thread_routine () from /usr/lib/libgnomevfs-2.so.0 #14 0x4075b44e in thread_entry () from /usr/lib/libgnomevfs-2.so.0 #15 0x40935377 in g_thread_create_proxy () from /usr/lib/libglib-2.0.so.0 #16 0x408eb871 in pthread_start_thread () from /lib/i686/libpthread.so.0 This looks like an allocator deadlock. We're talking with uli on irc about it. Of course, it could also be a random memory corruption bug. <foo> single step a bit more and tell be the value of $ecx every time you reach 0x4206fd70? <twaugh> foo: Er.. I killed it already. <foo> darn <twaugh> Sorry. <foo> the code is iterating over a list <foo> it's important to know how many elements are in the list <twaugh> Next time I see it, I'll try that. <twaugh> foo: I guess it would be a double-free or something that corrupts it. <foo> might not be a corruption <foo> double free: maybe <foo> arbitrary memory corruption: unlikely <foo> checking for double-free is easy, enable mtrace (this requires putting some code at the beginning of main) ~ *fb = 0; 0x4206fd6a <malloc_consolidate+90>: movl $0x0,(%eax) ~ check_inuse_chunk(av, p); ~ size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA); 0x4206fd70 <malloc_consolidate+96>: mov 0x4(%ecx),%eax 17 ~ nextp = p->fd; 0x4206fd73 <malloc_consolidate+99>: mov 0x8(%ecx),%edi 18 ~ size = p->size & ~(PREV_INUSE|NON_MAIN_ARENA); 0x4206fd76 <malloc_consolidate+102>: mov %eax,%esi 19 0x4206fd78 <malloc_consolidate+104>: mov %edi,0x8(%esp,1) 20 0x4206fd7c <malloc_consolidate+108>: and $0xfffffffa,%esi 21 ~ nextchunk = chunk_at_offset(p, size); 0x4206fd7f <malloc_consolidate+111>: lea (%esi,%ecx,1),%edi 22 ~ nextsize = chunksize(nextchunk); 0x4206fd82 <malloc_consolidate+114>: mov 0x4(%edi),%ebp 23 0x4206fd85 <malloc_consolidate+117>: mov %ebp,%edx 24 0x4206fd87 <malloc_consolidate+119>: and $0xfffffff8,%edx 25 ~ if (!prev_inuse(p)) { 0x4206fd8a <malloc_consolidate+122>: and $0x1,%eax 26 0x4206fd8d <malloc_consolidate+125>: mov %edx,(%esp,1) 27 0x4206fd90 <malloc_consolidate+128>: ~ jne 0x4206fda4 <malloc_consolidate+148> 28 ~ prevsize = p->prev_size; ~ size += prevsize; ~ p = chunk_at_offset(p, -((long) prevsize)); ~ unlink(p, bck, fwd); 0x4206fd92 <malloc_consolidate+130>: mov (%ecx),%eax 0x4206fd94 <malloc_consolidate+132>: sub %eax,%ecx 0x4206fd96 <malloc_consolidate+134>: mov 0x8(%ecx),%edx 0x4206fd99 <malloc_consolidate+137>: add %eax,%esi 0x4206fd9b <malloc_consolidate+139>: mov 0xc(%ecx),%eax 0x4206fd9e <malloc_consolidate+142>: mov %eax,0xc(%edx) 0x4206fda1 <malloc_consolidate+145>: mov %edx,0x8(%eax) ~ if (nextchunk != av->top) { 0x4206fda4 <malloc_consolidate+148>: mov 0x28(%esp,1),%eax 29 0x4206fda8 <malloc_consolidate+152>: cmp 0x54(%eax),%edi 30 0x4206fdab <malloc_consolidate+155>: ~ je 0x4206fe18 <malloc_consolidate+264> 31 ~ nextinuse = inuse_bit_at_offset(nextchunk, nextsize); 0x4206fdad <malloc_consolidate+157>: mov (%esp,1),%edx 32 ~ if (!nextinuse) { 0x4206fdb0 <malloc_consolidate+160>: testb $0x1,0x4(%edx,%edi,1) 33 0x4206fdb5 <malloc_consolidate+165>: ~ jne 0x4206fe10 <malloc_consolidate+256> 34 ~ size += nextsize; ~ unlink(nextchunk, bck, fwd); 0x4206fdb7 <malloc_consolidate+167>: mov 0x8(%edi),%ebp 0x4206fdba <malloc_consolidate+170>: add %edx,%esi 0x4206fdbc <malloc_consolidate+172>: mov 0xc(%edi),%eax 0x4206fdbf <malloc_consolidate+175>: mov %eax,0xc(%ebp) 0x4206fdc2 <malloc_consolidate+178>: mov%ebp,0x8(%eax) ~ first_unsorted = unsorted_bin->fd; ~ unsorted_bin->fd = p; ~ first_unsorted->bk = p; ~ set_head(p, size | PREV_INUSE); ~ p->bk = unsorted_bin; ~ p->fd = first_unsorted; ~ set_foot(p, size); 0x4206fdc5 <malloc_consolidate+181>: mov %esi,(%esi,%ecx,1) 3 0x4206fdc8 <malloc_consolidate+184>: mov 0x4(%esp,1),%eax 4 0x4206fdcc <malloc_consolidate+188>: mov %esi,%edx 5 0x4206fdce <malloc_consolidate+190>: or $0x1,%edx 6 0x4206fdd1 <malloc_consolidate+193>: mov 0x8(%eax),%edi 7 0x4206fdd4 <malloc_consolidate+196>: mov %ecx,0x8(%eax) 8 0x4206fdd7 <malloc_consolidate+199>: mov 0x4(%esp,1),%eax 9 0x4206fddb <malloc_consolidate+203>: mov %ecx,0xc(%edi) 10 0x4206fdde <malloc_consolidate+206>: mov %edx,0x4(%ecx) 11 0x4206fde1 <malloc_consolidate+209>: mov %eax,0xc(%ecx) 12 0x4206fde4 <malloc_consolidate+212>: mov %edi,0x8(%ecx) 13 ~ } while ( (p = nextp) != 0); 0x4206fde7 <malloc_consolidate+215>: mov 0x8(%esp,1),%ecx 14 0x4206fdeb <malloc_consolidate+219>: test %ecx,%ecx 15 0x4206fded <malloc_consolidate+221>: ~ jne 0x4206fd70 <malloc_consolidate+96> 16 0x4206fdef <malloc_consolidate+223>: mov 0x10(%esp,1),%eax 0x4206fdf3 <malloc_consolidate+227>: addl $0x4,0x10(%esp,1) 0x4206fdf8 <malloc_consolidate+232>: cmp 0xc(%esp,1),%eax 0x4206fdfc <malloc_consolidate+236>: ~ jne 0x4206fd5c <malloc_consolidate+76> 0x4206fe02 <malloc_consolidate+242>: add $0x14,%esp 0x4206fe05 <malloc_consolidate+245>: pop %ebx 0x4206fe06 <malloc_consolidate+246>: pop %esi 0x4206fe07 <malloc_consolidate+247>: pop %edi 0x4206fe08 <malloc_consolidate+248>: pop %ebp 0x4206fe09 <malloc_consolidate+249>: ret 0x4206fe0a <malloc_consolidate+250>: lea 0x0(%esi),%esi clear_inuse_bit_at_offset(nextchunk, 0); 0x4206fe10 <malloc_consolidate+256>: and $0xfffffffe,%ebp 35 0x4206fe13 <malloc_consolidate+259>: mov %ebp,0x4(%edi) 1 0x4206fe16 <malloc_consolidate+262>: ~ jmp 0x4206fdc5 <malloc_consolidate+181> 2 <foo> alex, twaugh: the malloc maintainer thinks that it /could/ be a double free <foo> alex, twaugh: so, keep on trying to MALLOC_DEBUG_ I have another trace. This time, mtrace() was called in nautilus-main.c:main, and debugging symbols are intact in the nautilus package (that I built). (gdb) info threads 9 Thread 7176 (LWP 9863) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 8 Thread 6151 (LWP 9862) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 7 Thread 5126 (LWP 9861) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 6 Thread 4101 (LWP 9860) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 5 Thread 3076 (LWP 9859) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 4 Thread 2051 (LWP 9858) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 3 Thread 1026 (LWP 9857) 0x420285a9 in sigsuspend () from /lib/i686/libc.so.6 2 Thread 2049 (LWP 9856) 0x420c3a2b in poll () from /lib/i686/libc.so.6 * 1 Thread 1024 (LWP 9846) 0x408ebcea in pthread_mutex_trylock () from /lib/i686/libpthread.so.0 (gdb) bt #0 0x408ebcea in pthread_mutex_trylock () from /lib/i686/libpthread.so.0 #1 0x302b3063 in ?? () #2 0x4206ee20 in malloc () from /lib/i686/libc.so.6 #3 0x40922d39 in g_malloc () from /usr/lib/libglib-2.0.so.0 #4 0x40932263 in g_strsplit () from /usr/lib/libglib-2.0.so.0 #5 0x4070e530 in ltable_insert () from /usr/lib/libgconf-2.so.4 #6 0x4070e202 in gconf_listeners_add () from /usr/lib/libgconf-2.so.4 #7 0x4071ff45 in gconf_client_notify_add () from /usr/lib/libgconf-2.so.4 #8 0x40118ecc in eel_gconf_notification_add () from /usr/lib/libeel-2.so.2 #9 0x40143700 in preferences_entry_ensure_gconf_connection () from /usr/lib/libeel-2.so.2 #10 0x08094d0e in fm_directory_view_init (view=0x8236598) at fm-directory-view.c:1342 #11 0x408c943b in g_type_create_instance () from /usr/lib/libgobject-2.0.so.0 #12 0x408b364f in g_object_constructor () from /usr/lib/libgobject-2.0.so.0 #13 0x408b2e5e in g_object_newv () from /usr/lib/libgobject-2.0.so.0 #14 0x408b361f in g_object_new_valist () from /usr/lib/libgobject-2.0.so.0 #15 0x408b2c16 in g_object_new () from /usr/lib/libgobject-2.0.so.0 #16 0x0805f9f0 in create_object (servant=0x812ec04, iid=0x408dde68 "\204m\003", ev=0xbffff7a0) at nautilus-application.c:119 #17 0x407708c1 in _ORBIT_skel_small_Bonobo_GenericFactory_createObject () from /usr/lib/libbonobo-activation.so.4 #18 0x407a3267 in ORBit_POAObject_invoke () from /usr/lib/libORBit-2.so.0 #19 0x407a7275 in ORBit_OAObject_invoke () from /usr/lib/libORBit-2.so.0 #20 0x40796093 in ORBit_small_invoke_adaptor () from /usr/lib/libORBit-2.so.0 #21 0x407a3741 in ORBit_POAObject_handle_request () from /usr/lib/libORBit-2.so.0 #22 0x407a3a71 in ORBit_POA_handle_request () from /usr/lib/libORBit-2.so.0 #23 0x407a717c in ORBit_handle_request () from /usr/lib/libORBit-2.so.0 #24 0x40791b55 in giop_connection_handle_input () from /usr/lib/libORBit-2.so.0#25 0x4089a8dd in linc_connection_io_handler () from /usr/lib/liblinc.so.1 #26 0x4089c640 in linc_source_dispatch () from /usr/lib/liblinc.so.1 #27 0x4091cf65 in g_main_dispatch () from /usr/lib/libglib-2.0.so.0 #28 0x4091df98 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #29 0x4091e2ad in g_main_context_iterate () from /usr/lib/libglib-2.0.so.0 #30 0x4091ea1f in g_main_loop_run () from /usr/lib/libglib-2.0.so.0 #31 0x4043b39f in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0 #32 0x0806834a in main (argc=1093664800, argv=0xbffffbc4) at nautilus-main.c:265 #33 0x420155c4 in __libc_start_main () from /lib/i686/libc.so.6 Are we confident that with current libc if it were a double free, MALLOC_CHECK_=2 would still make the allocator abort() in the second free? Since this is hanging in malloc() then if we were confident of that, we could rule out a double free presumably. I'm running this MALLOC_CHECK_ unset now, in favour of mtrace. But we lost the mtrace log because rpmq calls mtrace() itself (and I happened to run rpm -q). When we had the mtrace log, there was no report of a double-free according to mtrace(1). Created attachment 73325 [details]
typescript of gdb session; nautilus has symbols; MALLOC_TRACE set (and mtrace() call added), MALLOC_CHECK_ unset
thread 1 is looping in arena_get2(). Looks like this loop: /* Check the global, circularly linked list for available arenas. */ repeat: do { if(!mutex_trylock(&a->mutex)) { THREAD_STAT(++(a->stat_lock_loop)); tsd_setspecific(arena_key, (Void_t *)a); return a; } a = a->next; } while(a != a_tsd); /* If not even the list_lock can be obtained, try again. This can happen during `atfork', or for example on systems where thread creation makes it temporarily impossible to obtain _any_ locks. */ if(mutex_trylock(&list_lock)) { a = a_tsd; goto repeat; } This seems to indicate that all arenas are locked and so is the list_lock. This could be right since on of the other threads is blocked in fork. In fact in ptmalloc_lock_all() wich is called at_fork to grab all the thread locks. Strange that it blocked though. This seem to be *another* deadlock. Haven't managed to reproduce the problem with MALLOC_CHECK_=2. Still trying. I tried about 100 times. Then I got my fiancie to try. First time, she got a core file. :-) #0 0x42028501 in kill () from /lib/i686/libc.so.6 #1 0x408edf3d in raise () from /lib/i686/libpthread.so.0 #2 0x420298dc in abort () from /lib/i686/libc.so.6 #3 0x42071368 in free_check () from /lib/i686/libc.so.6 #4 0x420705c5 in free () from /lib/i686/libc.so.6 #5 0x0806841e in nautilus_navigation_bar_unimplemented_get_location () #6 0x420155c4 in __libc_start_main () from /lib/i686/libc.so.6 In other words, this is the second call of free() for the same object. This time with symbols: #0 0x42028501 in kill () from /lib/i686/libc.so.6 #1 0x408edf3d in raise () from /lib/i686/libpthread.so.0 #2 0x420298dc in abort () from /lib/i686/libc.so.6 #3 0x42071368 in free_check () from /lib/i686/libc.so.6 #4 0x420705c5 in free () from /lib/i686/libc.so.6 #5 0x0806841e in nautilus_navigation_bar_unimplemented_get_location () at nautilus-navigation-bar.c:48 #6 0x420155c4 in __libc_start_main () from /lib/i686/libc.so.6 not that it particularly helps. But it's here: EEL_IMPLEMENT_MUST_OVERRIDE_SIGNAL (nautilus_navigation_bar, get_location) Core file (from nautilus-2.0.5-3) at ~twaugh/nautilus-core. This isn't looking promising: #define EEL_IMPLEMENT_MUST_OVERRIDE_SIGNAL(prefix, signal) \ \ static void \ prefix##_unimplemented_##signal (void) \ { \ g_warning ("failed to override signal " #prefix "->" #signal); \ } No call to free in there... Groan. I'd built an unstripped nautilus binary and just pointed gdb at that. Guess that doesn't work. I'll build an unstripped package and install that. Hopefully I'll be able to get a core from it again. ... however, even when I pointed gdb at the actual executable that dumped core, it said that it was nautilus_navigation_bar_unimplemented_get_location at fault. Does g_warning do _no_ memory allocation? g_warning does do memory allocation, however there should be a few stack frames in there, probably g_log, g_logv, g_free at least. I compiled nautilus with a call to mcheck_pedantic(abort) as the first thing in main(). No core dump after two runs. crunge.com: Do you know a way to reproduce this bug at will? bug 70873 (smb hang) has the same backtrace (malloc_consolidate). Created attachment 73507 [details]
This gdb session is from a nautilus package compiled with a call to mcheck_pedantic() in main(). It's the nautilus process started on login.
Created attachment 73508 [details]
And another, same conditions.
I think I might have a handle on this ... it looks like two threads in gnome-vfs are processing a single list without locking and both trying to free the same element. Need to investigate how it *should* work further. gnome-vfs2-2.0.2-5 seems to fix the problem I was seeing, and looks likely to fix the above, though there certainly coudl be something else going on as well. I'm going to mark it MODIFIED; Testing would be very much appreciated. (I've put the package at http://people.redhat.com/otaylor/tmp/gnome-vfs2-test until rawhide propagates) I think the crashing/unresponsive behavior I'm seeing is related. The easiest way for me to recreate it is to have Mozilla running, then try to open a nautilus window by double clicking on the home icon on my desktop. The icon will stay highlighted, nautilus never opens a file manager window, and a minute or so later the icons disappear. Sometimes they reappear much later, as if nautilus respawned after a time out period, but the new icons don't respond to double clicking. This is the process that normally hangs around, although I have seen the throbber running as well: [tomg@gemini tomg]$ ps -ef | grep nautilus tomg 875 1 0 13:14 ? 00:00:00 nautilus --sm-config-prefix /nau FWIW, when nautilus is on the fritz like this I cannot start gedit from the applications menu either. I attached the running process to gdb and tried to get a backtrace, but I'm not sure that the trace has anything that hasn't been seen before. I've attached it anyway, just in case. I installed the latest nautilus from rawhide and the gnome-vfs2 linked above, but neither solved the problem. [tomg@gemini tomg]$ rpm -q gnome-vfs2 gnome-vfs2-2.0.2-5 [tomg@gemini tomg]$ rpm -q gnome-vfs2-devel gnome-vfs2-devel-2.0.2-5 [tomg@gemini tomg]$ rpm -q nautilus nautilus-2.0.6-1 [tomg@gemini tmp]$ rpm -q gnome-session gnome-session-2.0.5-4 Created attachment 74430 [details]
backtrace from tomg
Hmm, if it really was two threads freeing the same chunk (of a list), I think MALLOC_CHECK_=2 should have produced a core every time. However, note that MALLOC_CHECK_ does affect the timing, because allocation is effectively single-threaded when using it... If you continue to reproduce the hang in malloc_consolidate, please let me know and I'll try to build the nautilus beta (I got stuck at building glib2.0 with debian patches already :-( ). OK, looks like this is still ongoing, so I'll just leave it sitting in Modified for a while. I haven't seen this problem on Psyche gold. The gnome-vfs change seems to have fixed it indeed. |