Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 101357
Summary: | (IDE PDC202XX) ata failure with Severn | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux Beta | Reporter: | djh <djh> | ||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | beta2 | CC: | alan, leonard-rh-bugzilla, pfrields, riel | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.4.22-1.2086.nptl | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2003-10-10 01:04:08 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 100643 | ||||||
Attachments: |
|
Description
djh
2003-07-31 04:47:36 UTC
Created attachment 93286 [details]
lspci
What happens if you use the i686 kernel instead of the athlon kernel? Same result with the i686 version. (acpi=off) I'll try some recent vanilla and -ac kernels later. I could not reproduce it with 2.4.21, 2.4.22-pre6-ac1, or with Arjans 2.6 RPMs (2.6.0-0.test1.1.26 and 2.6.0-0.test2.1.28). What happens if you turn of the loading of smartd ("chkconfig --level 35 smartd off") and reboot the machine? On one of my machines smartd does a devicescan and due to this my ide-tape drive does funny things. It stops the errors about hdc. No change with hde. Its something in the severn stuff - I've seen multiple reports and even with ACPI and "all the usual suspects" enabled it only happens with the RH tree. Its really quite weird and I really don't know what severn is doing here. I've just tried moving the drive from the Promise to the VIA controller - same result. BTW here's another report - (the only hardware in common is the harddrive) http://www.redhat.com/archives/rhl-beta-list/2003-July/msg00962.html I tried the Severn2 kernel (2.4.22-1.2061.nptl) with the Severn1 installation and the same errors occur. I have just noticed what has changed - when using the Severn kernels the hard drive spins down after 5-10 mins. (I'd really like to know why) Bugzilla has lost the last few comments, so here is a summary. laptop_mode is disabled. "HDD power down" is disabled in BIOS. After a fresh install of Fedora 0.94 it still occurs. (0x51/04 errors, ide and ext3 failures, reset, manually fsck if required) We're starting to suspect DMA problems with fireball drives, as this is the third report I've been able to find, which is the only common factor. (Different chipsets each time). If you feel motivated to investigate this, can you paste the boot messages of both a RHL9 and a cambridge kernel so we can see how they differ ? Additionally, booting with ide=nodma may prevent around the corruption if our guesses are correct. You might want to add that quantum drive to the local blacklist for the PDC202xx - not sure why it should bite just the quantumn though It'll need adding in multiple places if thats the case, as this has been seen on at least 3 different controllers now. Also #91932 looks very similar (same hardware, also seeing corruption). disabling DMA didn't help in that case, so it's back to the drawing board. Are you using LVM ? I'm interested to hear if this fares any better... http://people.redhat.com/davej/2.4.22-1.2086.nptl/ No LVM, and ide=nodma didn't help much. (btw I can't reproduce it with the Taroon kernel - 2.4.21-3.EL) 2.4.22-1.2086.nptl is looking good so far. Any update on this ? Is it behaving now ? With the limited amount of testing I've been able to do, 2.4.22-1.2086.nptl seems to fix the problem. 2086 lasts for over 6 hours - previous Severn kernels would fail within 20 mins. I'll do some further tests, but I believe the problem is fixed. Sounds promising. Looks like the acoustic management patch doesn't play well with these drives. Thanks for chasing this. Can you paste the output of hdparm -I /dev/hd? from that Quantum Fireball please ? /dev/hde: ATA device, with non-removable media Model Number: QUANTUM FIREBALLP AS40.0 Serial Number: 194034230190 Firmware Revision: A1Y.1300 Standards: Used: ATA/ATAPI-5 T13 1321D revision 1 Supported: 5 4 3 2 & some of 6 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 78177792 device size with M = 1024*1024: 38172 MBytes device size with M = 1000*1000: 40027 MBytes (40 GB) Capabilities: LBA, IORDY(can be disabled) bytes avail on r/w long: 4 Queue depth: 1 Standby timer values: spec'd by Vendor, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Recommended acoustic management value: 254, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * READ BUFFER cmd * WRITE BUFFER cmd * Host Protected Area feature set * Look-ahead * Write cache * Power Management feature set Security Mode feature set * SMART feature set * Automatic Acoustic Management feature set * DOWNLOAD MICROCODE cmd Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count not supported: enhanced erase 24min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT. HW reset results: CBLID- above Vih Device num = 0 determined by CSEL Checksum: correct |