Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 428329
Summary: | Oops: unable to handle kernel paging request at virtual address 60001018 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Christopher Beland <beland> | ||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 8 | CC: | esandeen, james, jfrieben | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2009-01-09 05:45:26 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Christopher Beland
2008-01-10 20:49:37 UTC
Created attachment 291329 [details]
Output in /var/log/messages
Hmm, that 60001018 looks familiar... Bug 270141 and bug 426863 ppear to be the same thing. f: 55 push %ebp 10: 57 push %edi 11: 89 c7 mov %eax,%edi 13: 56 push %esi 14: 53 push %ebx 15: 8b 70 bc mov 0xffffffbc(%eax),%esi ... esi <- block_i 18: 8b 80 9c 00 00 00 mov 0x9c(%eax),%eax ... eax <- ei 1e: 85 f6 test %esi,%esi 20: 8b 98 64 01 00 00 mov 0x164(%eax),%ebx ... eax <- rsv_lock 26: 74 2f je 0x57 28: 8d 6e 14 lea 0x14(%esi),%ebp ... ebp <- rsv So block_i == 60001000 rsv == 60001014 00000000 <.text>: 0: 83 7d 04 00 cmpl $0x0,0x4(%ebp) OOPS => if (!rsv_is_empty(&rsv->rsv_window)) { 4: 74 26 je 0x2c 6: 8d 83 00 41 00 00 lea 0x4100(%ebx),%eax c: e8 72 a6 cf d1 call 0xd1cfa683 11: 83 7d 04 00 cmpl $0x0,0x4(%ebp) Seems to happen more than once in this code. hm, kswapd thread in the other bug too (can't tell on the 3rd bug, truncated oops).... Christopher, I see that you reported all 3 bugs, #428329, #270141, and #426863... feel free to just re-open or update an existing bug next time :) It's interesting that you have the same bad value in all 3... Similar oops: http://article.gmane.org/gmane.linux.kernel/584999 It's also in kswapd, but, it has a different bad address. Nice long discussion there, but no good ideas. Seems to me like either something stomped on this memory, or use after free... tho that seems unlikely since you have the same bad value all 3x. I'd ask you to run memtest86 but since one other person hit the same thing in the same callchain... hrm. Christopher, how often are you hitting this? I wonder if running with a debug kernel variant would yield any more info next time, if you hit it often. http://ubuntuforums.org/archive/index.php/t-337310.html http://www.ussg.iu.edu/hypermail/linux/kernel/0603.2/0403.html https://bugzilla.novell.com/show_bug.cgi?id=213905 I'm going to ask after all... will you go ahead & let memtest run for a while? > Christopher, how often are you hitting this?
When I get a kernel oops, I almost always report it. I don't know why I didn't
find the previous reports, but they are probably the only times I've seen this
problem.
I ran memtest last night and it did not detect any problems.
This is very odd. Enough people have hit this, it seems real, but there's not a lot to go on. Running with a debug kernel, in hopes that it gets hit again, might offer some clues. Or, setting up to get a system dump on an oops would also be a huge help. I'll look through the code some more... I'm not sure what you mean by "debug kernel". I do have the kernel-debuginfo and kernel-debuginfo-common RPMs installed. I read through this guide to kdump: http://fedoranews.org/mediawiki/index.php/Using_Kexec_and_Kdump_in_Rawhide which explains how to get a system dump from a kernel panic. Will a dump get triggered automatically on oops if I put "crashkernel=64M@16M" in the kernel's startup command in grub, and I do "chkconfig kdump on"? This just happened again, and once again fsck didn't find any problems in the filesystem. I haven't used my wireless card since my last reboot, so that can't be the cause. I suspend and resume all the time, but one thing I did yesterday that I don't usually do is hibernate and resume. I've taken the kdump-related steps in comment 11, in case that helps. If there are any hibneration-related diagnostics I should do, let me know. Sorry I didn't get back to you yet on comment #11. Honestly, I always have to read up on how to make kdump work, myself. :) So, did you get a system dump then? If you can share that it might yield some very good clues! Thanks, -Eric Alas, no, I hadn't added the kernel argument before the latest oops. But I've just hibernated and resumed, so I'm expecting it to happen again at any moment... If you didn't get a dump this time, I think you can test it with echo c > /proc/sysrq-trigger (maybe preceded by a few "syncs" for the filesystem's benefit) to see if it's working, before the next fleeting panic. Hm, and another report somewhat along the same lines... http://article.gmane.org/gmane.linux.kernel/626582 and a bug I closed due to tainting, but perhaps related: https://bugzilla.redhat.com/show_bug.cgi?id=208488 I just read at http://www.ibm.com/developerworks/library/l-fs8.html that some laptop hard drives throw away their write caches when being put into a low-power state, which can cause filesystem corruption. Could this cause memory corruption when I hibernate, if I am unlucky in my timing? I'd have to read up on how hibernate works, but I thought that at least recent code did a block device freeze, which should get everything safely on disk... The same article says that some hard drives say they have committed things to disk from write cache when they actually haven't, and problems obviously result when the two are combined. Created attachment 294937 [details]
Latest oops
Another oops, this time at "EIP is at ext3_discard_reservation+0x1c/0x4d
[ext3]". I didn't get a dump in /var/crash, though, and I'm not sure why. I
did reboot from a LiveCD (to do an fsck) before rebooting again from my hard
drive. (fsck -f didn't find any filesystem problems.)
Was this also after any suspend activity? You can use "echo c > /proc/sysrq-trigger" to trigger a "crash" and see if your crashdump utility is set up properly... *** Bug 426863 has been marked as a duplicate of this bug. *** I'm thinking this is most likely use after free... I wonder if we could get you set up & running with a kernel which would catch that, with CONFIG_SLAB_DEBUG or whatever is appropriate for this kernel... Yes, the latest crash happened after several days of uptime, after which I'd slept and hibernated and restored several times. I did "echo c > /proc/sysrq-trigger". A bunch of text flew by on the console. Near the end it looked like an attempted run of "fsck" that couldn't find /etc/fstab. I didn't get any files in /var/crash. The kernel line in /etc/grub.conf I'm using is: kernel /boot/vmlinuz-2.6.23.15-137.fc8 ro root=LABEL=/1 rhgb quiet usbcore.autosuspend=1 crashkernel=64M@16M "/sbin/chkconfig kdump --list" produces: kdump 0:off 1:off 2:on 3:on 4:on 5:on 6:off and I'm using runlevel 5. Did I do something wrong or incompletely? If someone can package it in an RPM, I'm happy to run any kernel which would help debug this. Created attachment 295096 [details]
Kernel oops section in /var/log/messages for kernel 2.6.25-0.50.rc2.fc9
This happened on a current "rawhide" x86_64 box after booting kernel
2.6.25-0.50.rc2.fc9. No additional modules such as "madwifi" had been
installed yet. My home partition uses "ext4dev" but looking at the
initial report this does not appear to be at the root of the issue.
Strange enough, this oops usually occurs when I'm building some RPM as
an ordinary user in /home. The compiler package is gcc-4.3.0-0.9.
I'm not sure when this issue occurred for the first time - I would
guess sometime this month.
Joachim, please open a new bug for that. It looks completely unrelated to this bug, and in fact is probably an ext4 problem. Thanks, -Eric Christopher, can you try running http://koji.fedoraproject.org/packages/kernel/2.6.23.15/137.fc8/i686/kernel-debug-2.6.23.15-137.fc8.i686.rpm (yum install kernel-debug, probably) it's the same version of your kernel but w/ debugging bells & whistles turned on. if you hit it again it might yield more info, though a crashdump would probably still be best... -Eric OK, I'm running kernel-debug-2.6.23.15-137.fc8.i686.rpm now. This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |