Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 121732 (IT_41260)
Summary: | oops in refile_inode when running high load | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Andrew Ryan <andrewr> | ||||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 1 | CC: | bugs-redhat, steved, tao | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2004-09-29 20:22:29 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Andrew Ryan
2004-04-26 20:47:11 UTC
Created attachment 99698 [details]
ksymoops output
Created attachment 99699 [details]
'vmstat 30' output for period preceding crash
Created attachment 99700 [details]
SysRq+T output from oopsed state
I submitted this to the linux-nfs mailing list, and according to Trond, this is a VM bug which should be fixed in FC1 kernels: http://marc.theaimsgroup.com/?l=linux-nfs&m=108301692018612&w=2 That it showed up on tests where we were using an NFS-mounted filesystem is, apparently, just coincidental. Subject: Re: [NFS] oops in FC1 update kernel, in refile_inode From: Trond Myklebust <trond.myklebust () fys ! uio ! no> Date: 2004-04-26 21:56:32 That is indeed a fix for a generic VFS/mm race. It has pretty much nothing to do with NFS itself but just happened to trigger on an NFS partition for someone. As far as I can see, that patch hasn't yet been applied to the latest errata kernel (linux-2.4.22-1.2188.nptl). Have you tried it out to see if it fixes your Oops? Steve, could you make sure that patch makes it into any future errata kernels? Cheers, Trond ["linux-2.4.26-refile_inode.dif" (linux-2.4.26-refile_inode.dif)] --- linux-2.4.26-up/fs/inode.c.orig 2004-03-19 17:12:46.000000000 -0500 +++ linux-2.4.26-up/fs/inode.c 2004-03-26 13:01:23.000000000 -0500 @@ -319,7 +319,8 @@ void refile_inode(struct inode *inode) if (!inode) return; spin_lock(&inode_lock); - __refile_inode(inode); + if (!(inode->i_state & I_LOCK)) + __refile_inode(inode); spin_unlock(&inode_lock); } With the above patch applied to the FC1.2179 kernel, we have not seen the oops in 2 days of constant testing. For reference, we used to see this oops after 2-8 hours of stress testing. patch is in cvs, and will be in the next update. Can this be the same issue as in bug 123332? I've posted there 2 stacktraces from kerlen panics, captured with a digital camera. BTW, forgot to notice, we're having those kernel panics on Fedora kernel 2.4.22-1.2188.nptlsmp, about once every 2 weeks. This is a production system, so unfortunately we cannot afford to stress-test it to reproduce this artificially. We cannot also connect a serial console, as the machine has only 1 serial port that has to be connected to a UPS. But the stacktraces captured with digital camera look exactly the same as the one reported here. We were suspecting this to be a hardware issue with 3Ware controller that runs our RAID5 array, but in the light of this bug it seems more probable to be a kernel bug, right? there should be a 2190 kernel in updates-testing, which should have this fixed. Out system just crashed again; I've installed the 2.4.22-1.2190.nptlsmp kernel package from 2004-05-26 - I'll let you know if it remedies the issue, but testing period will be long since this crash occurs about twice a month on this particular system. Does this issue affect Fedora 2's 2.6 kernel? no. refile_inode doesn't exist there. Another panic in refile_inode occured just today on kernel-2.4.22-1.2190.nptlsmp. The problem has not been resolved, or the problem is separate (in that case, bug 123332 is not a dupe of this one). BTW, looking at /usr/src/linux-2.4/fs/inode.c (from kernel-source-2.4.22-1.2190.nptl RPM) the fix from comment #3 is present there. But the panics still happen. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ Problem was found and fixed in RHEL3 U3. |