Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1898161 - aarch64: kernel panic booting thunderx1 platforms with acpi
Summary: aarch64: kernel panic booting thunderx1 platforms with acpi
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: aarch64
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Mark Salter
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2020-11-16 14:57 UTC by Mark Salter
Modified: 2021-03-10 17:56 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-10 17:56:04 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Mark Salter 2020-11-16 14:57:03 UTC
1. Please describe the problem:

The following splat is seen while booting thunderx1 platforms using acpi:

[    7.927837] ACPI GTDT: [Firmware Bug]: failed to get the Watchdog base address. 
[    7.935411] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 
[    7.944184] Mem abort info: 
[    7.946967]   ESR = 0x96000005 
[    7.950010]   EC = 0x25: DABT (current EL), IL = 32 bits 
[    7.955310]   SET = 0, FnV = 0 
[    7.958353]   EA = 0, S1PTW = 0 
[    7.961482] Data abort info: 
[    7.964352]   ISV = 0, ISS = 0x00000005 
[    7.968176]   CM = 0, WnR = 0 
[    7.971132] [0000000000000028] user address but active_mm is swapper 
[    7.977475] Internal error: Oops: 96000005 [#1] SMP 
[    7.982342] Modules linked in: 
[    7.985390] CPU: 15 PID: 1 Comm: swapper/0 Not tainted 5.10.0-0.rc3.68.eln105.aarch64 #1 
[    7.993468] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019 
[    8.000852] pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--) 
[    8.006856] pc : __ipi_send_mask+0x60/0x114 
[    8.011032] lr : smp_cross_call+0x3c/0xd0 
[    8.015030] sp : ffff8000128efa00 
[    8.018333] x29: ffff8000128efa00 x28: 0000000000000000  
[    8.023638] x27: ffff8000110e0538 x26: ffff800011171960  
[    8.028942] x25: 0000000000000006 x24: 0000000000000000  
[    8.034245] x23: ffff8000115e1000 x22: ffff800011998188  
[    8.039550] x21: ffff800010c0e0f0 x20: ffff800010c0e0f0  
[    8.044854] x19: ffff000100022c20 x18: 00000000fffffffe  
[    8.050158] x17: 0000000071254848 x16: 000000001055a9cd  
[    8.055461] x15: 0000000000000020 x14: ffffffffffffffff  
[    8.060765] x13: ffff8000928efc68 x12: 0000000000000018  
[    8.066069] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f  
[    8.071372] x9 : ffff80001002761c x8 : 00000000000004e4  
[    8.076676] x7 : 000003f02abd11d0 x6 : 00000000ffffffff  
[    8.081980] x5 : ffff800010f9d800 x4 : ffff800010f9d8c0  
[    8.087284] x3 : ffff800010c0daf0 x2 : 0000000000000000  
[    8.092588] x1 : ffff800010c0e0f0 x0 : 0000000000000000  
[    8.097892] Call trace: 
[    8.100330]  __ipi_send_mask+0x60/0x114 
[    8.104155]  smp_cross_call+0x3c/0xd0 
[    8.107807]  smp_send_reschedule+0x3c/0x50 
[    8.111894]  resched_curr+0x70/0xb0 
[    8.115372]  check_preempt_curr+0x58/0x90 
[    8.119370]  ttwu_do_wakeup+0x2c/0x194 
[    8.123109]  try_to_wake_up+0x248/0x5b0 
[    8.126934]  wake_up_process+0x24/0x30 
[    8.130676]  devtmpfs_submit_req+0x90/0xe0 
[    8.134761]  devtmpfs_create_node+0xa8/0xec 
[    8.138934]  device_add+0x474/0x4a0 
[    8.142412]  device_create+0x134/0x16c 
[    8.146153]  unix98_pty_init+0x214/0x22c 
[    8.150065]  pty_init+0x1c/0x2c 
[    8.153196]  do_one_initcall+0x50/0x280 
[    8.157021]  do_initcalls+0x104/0x144 
[    8.160673]  kernel_init_freeable+0x168/0x1c0 
[    8.165019]  kernel_init+0x20/0x134 
[    8.168498]  ret_from_fork+0x10/0x18 
[    8.172066] Code: a90363f7 aa0103f5 d000a557 f9401260 (b9402800)  
[    8.178173] ---[ end trace e117d9bdf84db237 ]--- 
[    8.182780] Kernel panic - not syncing: Oops: Fatal exception 
[    8.188528] SMP: stopping secondary CPUs 
[    9.242445] SMP: failed to stop secondary CPUs 2,15 
[    9.247311] Kernel Offset: disabled 
[    9.250788] CPU features: 0x0040002,69101108 
[    9.255046] Memory Limit: none 
[    9.258106] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- 


2. What is the Version-Release number of the kernel:

Any Fedora kernel based on upstream v5.10. I bisected this to
commit 64b499d8df40 "irqchip/gic-v3: Configure SGIs as standard interrupts"

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

100% reproducible. Boot v5.10 based kernel with acpi=on

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Mark Salter 2020-11-16 15:01:01 UTC
The problem seems to be rooted in the firmware. The watchdog timer interrupt is 0 which corresponds to an IPI after commit 64b499d8df40.  The zero GSI for the watchdog is clearly wrong but was harmless (except for non-functioning WDT) until now when it break IPI0.

Comment 2 Mark Salter 2021-03-10 17:56:04 UTC
As noted in comment 1 this is a firmware issue. There is a workaround with a command line option, so I see no need to clutter up the kernel code with some sort of quirk for EOL platforms. So to work around this, add "initcall_blacklist=gtdt_sbsa_gwdt_init" to the kernel command line.


Note You need to log in before you can comment on or make changes to this bug.