Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1898161

Summary: aarch64: kernel panic booting thunderx1 platforms with acpi
Product: [Fedora] Fedora Reporter: Mark Salter <msalter>
Component: kernelAssignee: Mark Salter <msalter>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: acaringi, adscvr, airlied, bskeggs, hdegoede, itamar, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, mjg59, pbrobinson, ptalbert, pwhalen, steved
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-10 17:56:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418    

Description Mark Salter 2020-11-16 14:57:03 UTC
1. Please describe the problem:

The following splat is seen while booting thunderx1 platforms using acpi:

[    7.927837] ACPI GTDT: [Firmware Bug]: failed to get the Watchdog base address. 
[    7.935411] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 
[    7.944184] Mem abort info: 
[    7.946967]   ESR = 0x96000005 
[    7.950010]   EC = 0x25: DABT (current EL), IL = 32 bits 
[    7.955310]   SET = 0, FnV = 0 
[    7.958353]   EA = 0, S1PTW = 0 
[    7.961482] Data abort info: 
[    7.964352]   ISV = 0, ISS = 0x00000005 
[    7.968176]   CM = 0, WnR = 0 
[    7.971132] [0000000000000028] user address but active_mm is swapper 
[    7.977475] Internal error: Oops: 96000005 [#1] SMP 
[    7.982342] Modules linked in: 
[    7.985390] CPU: 15 PID: 1 Comm: swapper/0 Not tainted 5.10.0-0.rc3.68.eln105.aarch64 #1 
[    7.993468] Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS F02 08/06/2019 
[    8.000852] pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--) 
[    8.006856] pc : __ipi_send_mask+0x60/0x114 
[    8.011032] lr : smp_cross_call+0x3c/0xd0 
[    8.015030] sp : ffff8000128efa00 
[    8.018333] x29: ffff8000128efa00 x28: 0000000000000000  
[    8.023638] x27: ffff8000110e0538 x26: ffff800011171960  
[    8.028942] x25: 0000000000000006 x24: 0000000000000000  
[    8.034245] x23: ffff8000115e1000 x22: ffff800011998188  
[    8.039550] x21: ffff800010c0e0f0 x20: ffff800010c0e0f0  
[    8.044854] x19: ffff000100022c20 x18: 00000000fffffffe  
[    8.050158] x17: 0000000071254848 x16: 000000001055a9cd  
[    8.055461] x15: 0000000000000020 x14: ffffffffffffffff  
[    8.060765] x13: ffff8000928efc68 x12: 0000000000000018  
[    8.066069] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f  
[    8.071372] x9 : ffff80001002761c x8 : 00000000000004e4  
[    8.076676] x7 : 000003f02abd11d0 x6 : 00000000ffffffff  
[    8.081980] x5 : ffff800010f9d800 x4 : ffff800010f9d8c0  
[    8.087284] x3 : ffff800010c0daf0 x2 : 0000000000000000  
[    8.092588] x1 : ffff800010c0e0f0 x0 : 0000000000000000  
[    8.097892] Call trace: 
[    8.100330]  __ipi_send_mask+0x60/0x114 
[    8.104155]  smp_cross_call+0x3c/0xd0 
[    8.107807]  smp_send_reschedule+0x3c/0x50 
[    8.111894]  resched_curr+0x70/0xb0 
[    8.115372]  check_preempt_curr+0x58/0x90 
[    8.119370]  ttwu_do_wakeup+0x2c/0x194 
[    8.123109]  try_to_wake_up+0x248/0x5b0 
[    8.126934]  wake_up_process+0x24/0x30 
[    8.130676]  devtmpfs_submit_req+0x90/0xe0 
[    8.134761]  devtmpfs_create_node+0xa8/0xec 
[    8.138934]  device_add+0x474/0x4a0 
[    8.142412]  device_create+0x134/0x16c 
[    8.146153]  unix98_pty_init+0x214/0x22c 
[    8.150065]  pty_init+0x1c/0x2c 
[    8.153196]  do_one_initcall+0x50/0x280 
[    8.157021]  do_initcalls+0x104/0x144 
[    8.160673]  kernel_init_freeable+0x168/0x1c0 
[    8.165019]  kernel_init+0x20/0x134 
[    8.168498]  ret_from_fork+0x10/0x18 
[    8.172066] Code: a90363f7 aa0103f5 d000a557 f9401260 (b9402800)  
[    8.178173] ---[ end trace e117d9bdf84db237 ]--- 
[    8.182780] Kernel panic - not syncing: Oops: Fatal exception 
[    8.188528] SMP: stopping secondary CPUs 
[    9.242445] SMP: failed to stop secondary CPUs 2,15 
[    9.247311] Kernel Offset: disabled 
[    9.250788] CPU features: 0x0040002,69101108 
[    9.255046] Memory Limit: none 
[    9.258106] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- 


2. What is the Version-Release number of the kernel:

Any Fedora kernel based on upstream v5.10. I bisected this to
commit 64b499d8df40 "irqchip/gic-v3: Configure SGIs as standard interrupts"

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

100% reproducible. Boot v5.10 based kernel with acpi=on

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Mark Salter 2020-11-16 15:01:01 UTC
The problem seems to be rooted in the firmware. The watchdog timer interrupt is 0 which corresponds to an IPI after commit 64b499d8df40.  The zero GSI for the watchdog is clearly wrong but was harmless (except for non-functioning WDT) until now when it break IPI0.

Comment 2 Mark Salter 2021-03-10 17:56:04 UTC
As noted in comment 1 this is a firmware issue. There is a workaround with a command line option, so I see no need to clutter up the kernel code with some sort of quirk for EOL platforms. So to work around this, add "initcall_blacklist=gtdt_sbsa_gwdt_init" to the kernel command line.