Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 98849
Summary: | (ACPI) ACPI oops on ThinkPad T30, T40 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux Beta | Reporter: | Bill Nottingham <notting> | ||||||||||||
Component: | kernel | Assignee: | Jeff Garzik <jgarzik> | ||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | alpha 3 | CC: | acpi-bugzilla, histed, peterm, rderooy, riel, rvokal, tmus | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2003-10-21 20:11:00 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 100643 | ||||||||||||||
Attachments: |
|
Description
Bill Nottingham
2003-07-09 15:16:53 UTC
Created attachment 92833 [details]
look, it blew up
Created attachment 92834 [details]
lspci output
SMBIOS 2.33 present. DMI 0.0 present. 56 structures occupying 1943 bytes. DMI table at 0x000E0010. Handle 0x0000 DMI type 0, 20 bytes. BIOS Information Block Vendor: IBM Version: 1RET32WW (1.03 ) Release: 03/04/2003 BIOS base: 0xDC000 ROM size: 960K Capabilities: Flags: 0x000000007D09DF80 I upgraded the BIOS to: SMBIOS 2.33 present. DMI 0.0 present. 56 structures occupying 1943 bytes. DMI table at 0x000E0010. Handle 0x0000 DMI type 0, 20 bytes. BIOS Information Block Vendor: IBM Version: 1RET34WW (1.05 ) Release: 05/15/2003 BIOS base: 0xDC000 ROM size: 960K Capabilities: Flags: 0x000000007D09DF80 ACPI behaves similarly in regards to loading the modules (still oopses) with the added benefit that if you run /sbin/hwclock, the machine locks up. (hwclock runs fine with acpi=off) From the 1st attachment: ACPI-0165: *** Warning: The ACPI AML in your computer contains errors, please nag the manufacturer to correct it. ACPI-0168: *** Warning: Allowing relaxed access to fields; turn on CONFIG_ACPI_DEBUG for details. Okay, we're out on a limb on this box... ACPI: Embedded Controller [EC] (gpe 28) ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME ACPI-1121: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.PUBS._STA] (Node dff55884), AE_TIME ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME ACPI-1121: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.BAT0._STA] (Node dff5435c), AE_TIME ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME ACPI-1121: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.BAT1._STA] (Node dff5453c), AE_TIME Looking not good. Possible that the later crash is related to this failure. Maybe we should take evasive action when fed methods that don't run? Bill, which IBM laptop is this? I'm wondering if it is one that UnitedLinux already blacklisted. Can you attach the DSDT? IBM T40p is the laptop. I'll attach the DSDT at some point when I boot back into ACPI. Created attachment 92886 [details]
acpidmp output
2.5.75 oopses as well, but does *not* have the bad interactions with hwclock. I'm 99% sure that these modules (processor, ac, etc.) worked in a previous 2.5 release (2.5.5x? 2.5.6x?) cool, we have a T40p so we should be able to duplicate this. I might need help with hwclock issues, though. Created attachment 92974 [details]
another oops
This just happened in the course of normal use... since it was keventd, it took
the keyboard with it.
Where can we get a copy of the kernel that is failing? dmesg shows 2.4.21-1.2023, including ACPICA 20030522, built 7/7/2003 -- which is much newer than Cambridge alpha-3. Cambridge B1 has something slightly newer than that. I can duplicate this on my T40 (2373-72U). Yeah, not good. I didn't get an oops but I got a hang by pressing the power button. There are a lot of symptoms listed in this bug, but I'm going to start with the EC errors on boot. Upgrade to BIOS 1.07 didn't fix anything in this area. I have a msg to the FreeBSD ACPI folks too -- maybe they can shed some light. I duplicate it in my T40, and find oops emerge only when ACPI_DEBUG option is on. Ec's errors were generated because read/write on EC's operation region is wrong. Accurately to say, parameter 'handler_context'(I call it context below) of routine 'acpi_ec_space_handler' is wrong, which leads to read/write on EC's operation region fails. If the context is correct(In my test, I use a temporary var to get the goal), everything is ok. Incorrect context only emerge after removing an operation region's handler and reinstalling it. So, I guess routine 'acpi_remove_address_space_handler' has some bugs. Created attachment 93319 [details]
patch for the bug
I made a patch. With this patch, kernel can boot correctly under my T40. All
ACPI modules can be loaded without error. '/proc/acpi' contains correct info.
Maybe this is what we need.
great debugging! So, it doesn't look like your patch fixes acpi_remove_address_space_handler. So is it just a workaround? How much trouble would a real fix be? I'm on vacation right now (HAHA) so I can't look at the code but if it's just some incorrect dereferencing then we should fix it. *** Bug 100667 has been marked as a duplicate of this bug. *** *** Bug 102581 has been marked as a duplicate of this bug. *** As requested, I tried the patch on my T30. Good news all round. I no longer get the EC errors when booting, and the system no longer panics on halt. In effect the patch solves the bug I reported (102581) looks to me like the evregion.c part of the patch has no effect. Can you verify that the modifications to ec.c are succifient to fix things? Thanks. BTW when we apply this to ec.c we will want to fix the whitespace. with just the ec.c patch I again get these errors on boot ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: Embedded Controller [EC] (gpe 28) ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME ACPI-1121: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.PUBS._STA] (Node f7ff560c), AE_TIME ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME ACPI-1121: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.BAT0._STA] (Node f7ff5ecc), AE_TIME ACPI-0345: *** Error: Handler for [EmbeddedControl] returned AE_TIME ACPI-1121: *** Error: Method execution failed [\_SB_.PCI0.LPC_.EC__.BAT1._STA] (Node f7ff429c), AE_TIME ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGP_._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT] PCI: Probing PCI hardware ACPI: PCI Interrupt Link [LNKF] enabled at IRQ 10 ACPI: PCI Interrupt Link [LNKG] enabled at IRQ 9 ACPI: PCI Interrupt Link [LNKH] enabled at IRQ 5 PCI: Using ACPI for IRQ routing PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off' I have not yet tried to do a halt, but I suspect it will panic again. confirmed, I get the panic on halt again. If you want the logs, let me know and I will hook up the serial termain again to capture the output like I did before. FYI/A Problem hasn't changed at all on my IBM ThinkPad-T30 with the latest rawhide kernel 2.4.22-20.1.2024.2.36.nptl let me know if you need any logs or need me to try something out! I don't think we've integrated that patch yet, as that patch has yet to be integrated into upstream ACPI code. I must be experiencing code-blindness :) Can someone take the time to tell me how the evregion.c part of the patch fixes things for people, as it clearly does but I don't see why. Either I have a ** or a * which I then take the address of when I use it - should be the same, eh? in file evregion.c: acpi_ev_detach_region line 397: region_context = region_obj2->extra.region_context; line 452:status = region_setup (region_obj, ACPI_REGION_DEACTIVATE, handler_obj->address_space.context, ®ion_context); these codes seem to change 'region_obj2->extra.region_context', but 'region_context ' was defined as 'void *'. so these codes can do nothing. In my patch, I just let it do the right thing, that is, codes can change 'region_obj2->extra.region_context'. Works for me now with .208x or later. Well, ACPI was completely removed from the kernel, right(at least for these machines or something)... My acpid does no longer start but apmd does. The system doesn't crash though and thats nice! If I didn't miss something completely, I wouldn't say that circumventing a problem means it's resolved. Could someone please comment on this??? ACPI is included, it just must be explicitly enabled with acpi=on. The fix for the crash *is* included in the latest ACPI code. Ahh, I see, thanks for clearing that up for me! |