Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 199142
Summary: | sata_promise(?) BUG unable to handle NULL pointer dereference | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Josep <josep.puigdemont> | ||||||||||||||||||
Component: | kernel | Assignee: | Jeff Garzik <jgarzik> | ||||||||||||||||||
Status: | CLOSED UPSTREAM | QA Contact: | Brian Brock <bbrock> | ||||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||||
Priority: | medium | ||||||||||||||||||||
Version: | 5 | CC: | davej, ddalton, dzrudy, jason_mack, jason__m, lsof, mail, mh, peterm, p.r.schaffner, whitefrost01, wtogami | ||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | athlon | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2006-11-14 16:19:53 UTC | Type: | --- | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||
Bug Blocks: | 172490 | ||||||||||||||||||||
Attachments: |
|
Description
Josep
2006-07-17 14:58:33 UTC
Created attachment 132549 [details]
Boot messages
Created attachment 132550 [details]
lspci
Created attachment 132551 [details]
verbose lspci
I meant to do a "lspci -v" for more info. Here we go, sorry for the noise.
I've just realized this might be a duplicate of #197441 (commenting there too) Created attachment 132918 [details]
lspci -v
I can confirm this bug. On my system the problem is even worse. When booting I
don't get any further than the message "Red Hat nash version 5.0.32 starting".
After that I get "BUG: unable to handle kernel NULL pointer dereference ..."
(see the above attachment of boot messages for the rest of it, it looks very
similar here).
I attached the output of "lspci -v".
I forgot to mention the kernels I tried during the last weeks: 2.6.16-1.2133_FC5smp -> works 2.6.17-1.2139_FC5smp -> problem 2.6.17-1.2145_FC5smp -> problem 2.6.17-1.2157_FC5smp -> problem 2.6.17-1.2159_FC5smp -> problem The problem still persists with kernel-2.6.17-1.2174_FC5 If I am not mistaken, FC6 will use 2.6.17 kernels (and above), this means it won't work for people using that SATA controller, should we raise the severity to HIGH? I'm still not sure if it is a duplicate of bug 197441. This is just a "me-too" comment. I'm seeing the same thing on my Athlons (both 64 and 32 bit machines). Is someone looking into this? All bugs filed on this issue seem to be unassigned. Created attachment 136477 [details]
Strange messages
These strange messages appear _always_ after loading sata_promise.
Notice though that the media check dialog looks fine (all messages after that
are ok)
I can confirm that this issue is still present in the current kernel 2.6.17-1.2187_FC5. I can also confirm that it disappears if I disable the SATA P20579 controller on the BIOS. About comment #9, it's the fc6t3 installation cd. The strange text messages always appear when the sata controller is enabled, and never when it is disabled. Just FYI, commenting out the linux-2.6-sata-promise-pata-ports.patch from spec file in kernel's src.rpm solved the problem for me. See bug #201966 A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. Hi Dave, I have just installed the new kernel and it still doesn't boot until I rebuild the rpm without the sata-promise-pata-ports patch. I have also plugged my drives to my second onboard controller (via_sata driver) and disabled the Promise one in the BIOS. After that the kernel's boot process goes a little futrther, but the machine freeezes on starting UDEV stage. While running on via_sata, I was able to see that sata_promise is still being loaded and causes a traceback. I coudn't catch the full tracebak since I don't have a serial console and boot_delay=500 doesn't help because it's an initrd stage, however it seems to be identical to the one I reported in my other bug (#20166). Also in this case (running on via_sata) removing the sata-promise-pata-ports patch allows me to boot sucessfully. Like Dawid, I can also confirm that the kernel still crashes with a similar error message at startup (the NULL pointer dereference). Although the kernell seems to continue booting, it freezes when starting udev (I left it for a couple of hours without anything happening at one time). When disabling the SATA P20579 device in the BIOS (as I mention in comment 10), the kernel boots without problems (udev gives an error that I never remember to write down, I don't think it's relevant though). Although probably not relevant, I noticed that windows crashes when trying to turn off the computer (it resets instead, due to the crash). When diabling the PATA device, windows can turn off the computer normally. I fixed the problem apparently be removing the logical volumes and manually configuring the partitions without any logical volumes. Just to confirm that this bug is still present with the kernel shipped with FC 6 (2.16.18), although the crash (the "unable to handle NULL pointer reference"), happens to a later stage, around when it starts udev. Please, let me know if you'd like me to send the boot messages log. yes, if the crash message makes it into the logs, please do attach. Hi, this is tha "backtrace" generated during boot (I attached a serial terminal so I could get the messages captured in a file). This is an extract of the log that I'll attach (some udev messages were mixed up in here, sorry). You'll see the nvidia kernel module inserted. If you think that's causing the problem, I'll remove it. I need to add that some times the crash actually happened _before_ the nvidia module was inserted. BUG: unable to handle kernel NULL pointer dereferenced udevd-event[ at virtual address 00000008 704]: wait_for_s printing eip: ysfs: file '/sys*pde = 3f657067 /devices/pci0000Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:10.2/usb3/usbdev3.1_ep00/dev Modules linked in: soundcore r8169 emu10k1_gp gameport ohci1394 ieee1394 sata_promise nvidia(U) i2c_viapro k8_edac edac_mc i2c_core serio_raw ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod usb_storage sata_via libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<f89a154c>] Tainted: P VLI EFLAGS: 00010202 (2.6.18-1.2798.fc6 #1) EIP is at pdc_sata_scr_read+0x14/0x17 [sata_promise] eax: 00000008 ebx: f71c035c ecx: f71c035c edx: 00000002 esi: f7e62df8 edi: f71c035c ebp: f71bcd78 esp: f7e62d98 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 723, ti=f7e62000 task=f7f04920 task.ti=f7e62000) Stack: f88f1a22 f71bc930 f71bc4e8 f88f657d f88fcceb f7c18c80 00000053 f88fcd97 f88aa300 f88aa338 00000000 000000c1 f7087400 f7c18c80 00000000 00000003 f7fa3048 00000002 00000002 00000003 00000003 00000005 00000005 f71bc930 Call Trace: [<f88f1a22>] sata_scr_read+0x1a/0x28 [libata] [<f88f657d>] ata_device_add+0x406/0x765 [libata] [<f89a1a25>] pdc_ata_init_one+0x2dc/0x317 [sata_promise] [<c04f0293>] pci_device_probe+0x36/0x57 [<c05525b1>] driver_probe_device+0x45/0x9a [<c05526dc>] __driver_attach+0x65/0x8f [<c0552036>] bus_for_each_dev+0x37/0x59 [<c0552512>] driver_attach+0x16/0x18 [<c0551d2e>] bus_add_driver+0x6f/0x10d [<c04f03c5>] __pci_register_driver+0x49/0x63 [<c043f1fb>] sys_init_module+0x17de/0x1977 [<c0404013>] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb Leftover inexact backtrace: ======================= Code: 69 42 a8 c7 e8 ab 44 a6 c7 83 c4 14 89 da 89 f0 5b 5e e9 1d 81 f5 ff 89 c1 83 c8 ff 83 fa 02 77 0c 8d 04 95 00 00 00 00 03 41 68 <8b> 00 c3 8b 80 b8 1f 00 00 8b 40 18 8b 40 40 c3 83 fa 02 53 89 EIP: [<f89a154c>] pdc_sata_scr_read+0x14/0x17 [sata_promise] SS:ESP 0068:f7e62d98 :00/0000:00:06.0/bus' appeared after 0 loops Created attachment 141013 [details]
Boot messages, with udev messages set to "debug"
Boot messages with udev debug messages.
Notice that close to the bottom of the file, you'll see a string "FET", this is
the translated message "DONE" that appears when each of the init scripts
sucessfully finishes.
Right now it is impossible for me to boot to Fedora, but if you need more
information or that I change something, I have a dual boot with debian, and
from there I could modify anything needed.
Just to make sure, these are the messages without the nvidia module inserted. Press 'I' to enter interactive startup. S'est. configurant el rellotge (utc): dl nov 13 02:11:57 CET 2006 [ FET ] S'est. iniciant el udev: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: *pde = 3ef44067 Oops: 0000 [#1] SMP last sysfs file: /class/input/input2/event2/dev Modules linked in: ieee1394 r8169 ide_cd sata_promise emu10k1_gp i2c_viapro k8_e dac cdrom gameport i2c_core pcspkr edac_mc snd_seq_device snd_timer snd_page_all oc snd_util_mem serio_raw snd_hwdep snd soundcore dm_snapshot dm_zero dm_mirror dm_mod usb_storage sata_via libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uh ci_hcd CPU: 0 EIP: 0060:[<f898754c>] Not tainted VLI EFLAGS: 00010202 (2.6.18-1.2798.fc6 #1) EIP is at pdc_sata_scr_read+0x14/0x17 [sata_promise] eax: 00000008 ebx: f723c35c ecx: f723c35c edx: 00000002 esi: f7d38df8 edi: f723c35c ebp: f7218d78 esp: f7d38d98 ds: 007b es: 007b ss: 0068 Process modprobe (pid: 617, ti=f7d38000 task=f7f08430 task.ti=f7d38000) Stack: f88f1a22 f7218930 f72184e8 f88f657d f88fcceb f71458c0 00000053 f88fcd97 f8866300 f8866338 00000000 000000b1 f7037000 f71458c0 00000000 00000003 f7fa3048 00000002 00000002 00000003 00000003 00000005 00000005 f7218930 Call Trace: [<f88f1a22>] sata_scr_read+0x1a/0x28 [libata] [<f88f657d>] ata_device_add+0x406/0x765 [libata] [<f8987a25>] pdc_ata_init_one+0x2dc/0x317 [sata_promise] [<c04f0293>] pci_device_probe+0x36/0x57 [<c05525b1>] driver_probe_device+0x45/0x9a [<c05526dc>] __driver_attach+0x65/0x8f [<c0552036>] bus_for_each_dev+0x37/0x59 [<c0552512>] driver_attach+0x16/0x18 [<c0551d2e>] bus_add_driver+0x6f/0x10d [<c04f03c5>] __pci_register_driver+0x49/0x63 [<c043f1fb>] sys_init_module+0x17de/0x1977 [<c0404013>] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb Leftover inexact backtrace: ======================= Code: 69 e2 a9 c7 e8 ab e4 a7 c7 83 c4 14 89 da 89 f0 5b 5e e9 1d 21 f7 ff 89 c1 83 c8 ff 83 fa 02 77 0c 8d 04 95 00 00 00 00 03 41 68 <8b> 00 c3 8b 80 b8 1f 00 00 8b 40 18 8b 40 40 c3 83 fa 02 53 89 EIP: [<f898754c>] pdc_sata_scr_read+0x14/0x17 [sata_promise] SS:ESP 0068:f7d38d9 8 udevd-event[608]: run_program: '/sbin/modprobe' abnormal exit <6>Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 Created attachment 141053 [details]
Proposed fix
See if this patch fixes things.
*** Bug 198937 has been marked as a duplicate of this bug. *** *** Bug 201966 has been marked as a duplicate of this bug. *** Trying to apply your patch, I discrovered that the file drivers/ata/sata_promise.c does not exist, although there is a drivers/scsi/sata_promise.c with similar (probably identical) code. Should I patch the later file, instead? I am using the following source rpm: kernel-2.6.18-1.2849.fc6.src.rpm *** Bug 212320 has been marked as a duplicate of this bug. *** Jeff, I applied it to /driver/scsi/sata_promise.c and it now oopses in pdc_sata_scr_write instead of pdc_sata_scr_read It seems to have exactly the same "if" condition, so I'll patch it with the same change and let you know if it helps Created attachment 141129 [details]
sata_promise patch
Applying the same "if" condition in pdc_sata_scr_read and pdc_sata_scr_write
solved the problem for me :-)
I have removed the sata promise patch from the kernel config to get a TX2 150 going. It is running now, but gives me a lot warning messages: kernel: ata1: status=0x50 { DriveReady SeekComplete } kernel: ata1: no sense translation for status: 0x50 kernel: ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00 Same problem as descibed here: http://lkml.org/lkml/2006/10/25/198 ? I have committed the scr_read and scr_write modifications to jgarzik/libata-dev.git#promise-sata-pata (which will eventually trickle up to -mm). For other errors (e.g. White FrosT @ Comment #28) please file a separate bug report. Jeff, Does it mean that this patch will make into next kernel update for FC6 and 5 (it would enable me to upgrade to FC6) or will we have to wait until it is accepted by upstream? Jeff, thanks for the patch. I hope to try it this weekend. My kernel BUG serial output for sata_promise is posted over in Bug 199216. Speaking of that, there seems to be a lot of outstanding bug reports on this issue. These are the ones I found so far: Bugzilla Bug 199142: sata_promise BUG unable to handle NULL pointer dereference Bugzilla Bug 198937: sata_promise crash at boot Bugzilla Bug 201966: Anaconda can't find my HDD on sata_promise Bugzilla Bug 199216: Sata_promise works for kernel 2.6.16-1.2122_FC5 but not for 2.6.17-1.2157_FC5 Is there someone (QA maybe) who can scrub out the related bugs and mark them as dup's of this one so we all know where to go to find the progress reports? *** Bug 199216 has been marked as a duplicate of this bug. *** (In reply to comment #30) > Does it mean that this patch will make into next kernel update for FC6 and 5 (it > would enable me to upgrade to FC6) or will we have to wait until it is accepted > by upstream? I have the same question. It does seem a bit odd to me to close the bug before there is a record posted of a successful test? I can provide that, though. I used the patch 141129 from below ("sata_promise patch") on FC5 kernel 2.6.18, #2239 and I successfully booted, SATA drives seem ok. I also recompiled with SMP turned on, and this worked too. I can also confirm that with the given patch the described problem went away. What I'm waiting for now is a new official kernel for FC6 with the patch, it's been a while ;-) kernel-2.6.18-1.2860.fc6 from updates-testing repo does work correctly with my Promise SATA controller/drive. (In reply to comment #35) > kernel-2.6.18-1.2860.fc6 from updates-testing repo does work correctly with my > Promise SATA controller/drive. Agreed. 2.6.18-1.2860.fc6 from updates-testing works here too. Hardware: Mass storage controller: Promise Technology, Inc. PDC20575 (SATAII150 TX2plus) (rev 02). Now, will a "clean" install of FC6 over http/ftp work, or do I pull the card during install? In other words, does the 2860 kernel ever become the one that Anaconda will use for inital boot/install? > Now, will a "clean" install of FC6 over http/ftp work, or do I pull the card
I tried the Zod live cd, and it was fine.
Does this mean the update has made the non-testing FC6 kernel?
|