Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 181310
Summary: | sata_promise command time out kills all disks on SMP | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alexandre Oliva <oliva> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | bugzilla, cmarco, davej, jlfenton65, k.georgiou, pb, pfrields, rhbz, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-11-24 23:07:33 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 172490, 182618 |
Description
Alexandre Oliva
2006-02-13 13:12:12 UTC
I have the same problem, but with slightly different hardware. It still involves dual AMD64 CPUs and is a SATA problem, however, I'm using a Master2-FAR dual Opteron mobo where the SATA is driven by the VT8237. The drives are identical Hitachi 80G drives, but are not set up as RAID. Heavy disk/network activity - ESPECIALLY bittorrent - causes the SATA drives to go out to lunch. Just using the drive normally, it may take many hours for the problem to occur. Using the drive as a target of a usenet reader may take three to four hours for the problem to occur, but using dozens of connections on several files in bittorrent can make it occur in minutes. The message I got last was: ata2: command 0xb0 timeout, stat 0x50 host_stat 0x0 ata2: command 0x35 timeout, stat 0x50 host_stat 0x4 That was with the kernel 2.6.15-1.2054 that comes with FC5 release. I have almost the same setup (Asus A8V-Deluxe, A64, two Maxtor HDs connected to the Promise controller) and the box is stable. So it does seem that the problem is related to SMP. And I was just about to order 20 A8V-Deluxe,A64X2,4 SATA disks for work today :( Same problem: AMD64 X2 4200+, ABit AV7, 160GB Maxtor SATA dmesg shortly before filesystem is gone: ata1: command 0x35 timeout, stat 0x50 host_stat 0x4 I noticed jumpy, slow moving behaviour of my usb mouse. Right after that issue the SATA problem begins. That must be related somehow. Interrupts getting messed up? Same Porblem: AMD64 X2 4600+, Abit AV8 3rd-Eye No real pattern when it ocurrs 10min to a number of hours, machine comes VERY slow and if you swap to a console screen you see ... ata2: command 0x35 timeout, stat 0x50 host_stat 0x4 adding noapic to my bott options seems to stop it hanging. The FC4 installation on the same disk works fine. Perhaps related: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=183138#c5 System: Athlon 700 MHz / MSI board Will try with APM + noapic instead of ACPI now. noapci and acpi=off won't help, perhaps it's a Athlon specific issue, Promise SATA 150 TX4 has the same problem used with this CPU/Board combination. Both controllers (SATA 150/SATA II 300) working fine on 2 Intel CPU based hosts (PIII-933 and PII-350). BTW: MSI borad is a K7 PRO Same problem, but I'm using an Intel Core 2 Duo, and the motherboard has sata_nv and sata_sil24. I see the problem with drives on on both controllers. The only difference is that the sata_sil24 controller does a reset (but the errors continue). Otherwise the errors are exactly as described by the others in this bz. Setting maxcpus=1 has fixed it for me so far as well. (THANKS Alexandre!!) Asus P5N32-SLI SE Deluxe 4 Seagate 320G SATA-II HDs (2 in RAID1 config, 2 LVM) I just tried the latest kernel in FC5 updates testing, and it did not solve this problem for me. I will try FC6 Test 3 when it is released in a few days. If I am indeed hitting this same defect, should I expect this to clear up with FC6t3? In my case it starts a couple weeks ago. First on my Asus P4P800 Delux motherboard with P4 3.2GHz HT (Intel based: ata_piix): ata1: command 0xc8 timeout, stat 0x50 host_stat 0x21 Then I changed suspected disk to new one, but it didn't solve this problem, so I changed motherboard to ASRock P4V88+ (VIA based) and PSU (to be sure), it didn't solve the problem as well. ata1: command 0xea timeout, stat 0x50 host_stat 0x0 ata2: command 0x35 timeout, stat 0x50 host_stat 0x4 ata1: command 0x35 timeout, stat 0x50 host_stat 0x4 Additionally, usb ports stopped detecting new devices, usb mouse became jumpy, usb keyboard lost characters or add some when I type. Downgrading to kernel-smp-2.6.16-1.2069_FC4 from kernel-smp-2.6.17-1.2142_FC4 help with usb problems, but not with sata timeouts. Fedora Core 4 P4 3.2GHz HT, 2GB RAM Asus P4P800 Delux ICH5 (ata_piix) or ASRock P4V88+ (sata_via) 2 x 160GB SATA in RAID1 (/dev/md0) mounted as root A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. This bug has been mass-closed along with all other bugs that have been in NEEDINFO state for several months. Due to the large volume of inactive bugs in bugzilla, this is the only method we have of cleaning out stale bug reports where the reporter has disappeared. If you can reproduce this bug after installing all the current updates, please reopen this bug. If you are not the reporter, you can add a comment requesting it be reopened, and someone will get to it asap. Thank you. |