Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 182618
Summary: | irqbalance makes K8T800Pro system with Athlon64X2 unstable | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alexandre Oliva <oliva> |
Component: | irqbalance | Assignee: | Neil Horman <nhorman> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | davej, marko.macek, peterd, redhat, rhbz, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-08-08 18:45:32 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 181310, 181920 | ||
Bug Blocks: | 182617 |
Description
Alexandre Oliva
2006-02-23 18:08:27 UTC
*** Bug 181347 has been marked as a duplicate of this bug. *** Same problem here, on 32-bit kernel. I built myself a stock kernel after having problems, my kernel is currently: title Fedora Core (2.6.16.11) root (hd0,0) kernel /vmlinuz-2.6.16.11 ro root=LABEL=/ rhgb quiet report_lost_ticks=1 notsc clock=pmtmr console=ttyS0,115200n8 noapic initrd /initrd-2.6.16.11.img I added 'noapic' today and disabled 'irqbalance'. We'll see how things go. If it's ok after a few days, I'll remove the 'noapic'. Usually fails during heavy network activity, or randomly while I'm away. I use an offboard 3c59x NIC, cause my onboard one died. I have the exact same problem... See http://lkml.org/lkml/2006/5/16/67 for more info! - vin Same(?) problem, different results. Disabling irqbalance did not work for me. Asus P5N32-SLI SE Deluxe motherboard, with Core 2 Duo processor, running Kernel 2.6.7.1-2187_FC5 notable drivers: sky2 sata_sil24 sata_nv I am being bit regularly by the sata problems described in bug 181310. After disabling irqbalance and running bittorrent for many hours, I got my first occurrence of the network problem described in bug 181347. That was with kernel 2.6.17.-1_2174_FC5 I have also experienced the jerky mouse movement, but that was with FC6T2, and only when my mouse was connected through a hub - a dell 2407wfp. The mouse would start smooth, but after awhile become jerky. Motion would be smooth again if I plugged it directly into a usb port on the computer. I have since removed FC6T2, because I hadn't yet found all this other bug history. Using a non-beta OS was also important to me because all the hardware was (is) brand new. I don't even know if I have a bad motherboard or not.. My symptoms are almost exactly like what is described by Alexandre, so I'm assuming the motherboard is good, and the kernel is bad. But due to the inactivity on this and the other bz', I wish it were the other way round! Small correction - - 2.6.7.1-2187_FC5 + 2.6.17.1-2187_FC5 And I'm running the 64 bit kernels. Let me know if I can be of any assistance in testing fixes for these issues. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. For me, this is the same problem I was having on this bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=166437 The important info. is this problem surfaced after kernel-smp-2.6.13-1.1532_FC4 (>= 2.6.14). I tried disabling irqbalance and am running 2.6.17-1.2187_FC5(x84_64). Locked up after 1 hour of heavy CPU load. MB is ABIT AV8 K8T800 Pro (Via); CPU Athlon64X2 4400+; 4GB mem. Maybe I'm not seeing the same problem. All I get are lockups from 1 hour to as long as 2 days. High load seems to aggrivate the problem. I'll re-test with 2.6.18, when the 64 bit package is released (not seeing it in updates yet). I found a way to get the machine to lock up on cue, so, able to do more rapid testing . . . My problem turned out to be an old ('95) Intel EE Pro 100 PCI NIC card. I pulled it and it's been running like a champ for almost a day. I stand by my assertion that 2.6.13 was stable even with this old NIC installed. I wasn't able to get any info from the NMI watchdog. IOMMU maybe? 2.6.18 (2200) running fine. 2.6.18 (2200) NOT running fine for me. I have reproduced the error twice. The message seems to have changed since the last kernel though. But it still is a timeout. ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata5.00: revalidation failed (errno=-5) ata5: failed to recover some devices, retrying in 5 secs ata5.00: qc timeout (cmd 0xec) ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata5.00: revalidation failed (errno=-5) ata5: failed to recover some devices, retrying in 5 secs ata5.00: qc timeout (cmd 0xec) ... Both times were after I had closed a tvtime window (hardware is a bt848 based wintv card circa 1996). I think this may point to irq mismanagement, as another person commented in this collection of related bugs - bug seems to crop up after a change to the load on the system. One big difference this time is that the timeouts did not repeat forever. The system seemed to recover after a few timeout errors. However my raid array was degraded in the process. sdd was dropped from the two drive raid-1 array. if you added a comment above of the form "I disabled irqbalance and my problem still happened" then it's unlikely to be related to this bug, and you should open a separate one. I'm reassigning this to irqbalance in the hope that Neil has some ideas what could be going wrong in Alexandre's case. Confirming this is still a problem with FC6 (uname -r gives 2.6.18-1.2849.fc6). K8T800Pro, Athlon64 4400+. I get the problem where the network interface (a Marvell 88e8001 controller) stops responding until I unload and reload the module. Disabling the irqbalance service resolves the problem. Alexandre and I have been down this road before. I am completely unable to reproduce this error here on any of my systems, and thus far, the only simmilarity I can find between any of the system that reports what appears to be the same problem is that they all contain a variant of the Asus A8 motherboard. not really sure what to do with this. My reading has indicated that people with this motherboard have had more success by disabling on board video and using a separate video card. I am using an ASUS A8V Deluxe board, so that part sort of jives. I'm using a separate video card though (an Nvidia 6800GT), the A8V Deluxe doesn't have on-board video. |