Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1874117 - 5.8.3-300.fc33.aarch64 kernel panic on boot (X-Gene PMU)
Summary: 5.8.3-300.fc33.aarch64 kernel panic on boot (X-Gene PMU)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Peter Robinson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: ARMTracker F33BetaBlocker F33FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2020-08-31 15:34 UTC by Paul Whalen
Modified: 2020-10-13 20:49 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-13 20:49:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
fulll boot log (47.21 KB, text/plain)
2020-08-31 15:36 UTC, Paul Whalen
no flags Details
Fix uninitialized variable in xgene PMU driver (3.91 KB, application/mbox)
2020-09-02 01:10 UTC, Mark Salter
no flags Details

Description Paul Whalen 2020-08-31 15:34:34 UTC
1. Please describe the problem:
When attempting to boot 5.8.3-300.fc33 on an ampere eMag, it panics. 

2. What is the Version-Release number of the kernel:
5.8.3-300.fc33

Panic:

[    9.925276] xgene-pmu APMC0D83:00: X-Gene PMU version 3
[    9.938064] Unable to handle kernel read from unreadable memory at virtual address 0000000000004006
[    9.947101] Mem abort info:
[    9.949882]   ESR = 0x96000004
[    9.952927]   EC = 0x25: DABT (current EL), IL = 32 bits
[    9.958225]   SET = 0, FnV = 0
[    9.961265]   EA = 0, S1PTW = 0
[    9.964395] Data abort info:
[    9.967262]   ISV = 0, ISS = 0x00000004
[    9.971083]   CM = 0, WnR = 0
[    9.974041] [0000000000004006] user address but active_mm is swapper
[    9.980381] Internal error: Oops: 96000004 [#1] SMP
[    9.985246] Modules linked in:
[    9.988289] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.3-300.fc33.aarch64 #1
[    9.995583] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS HVE104N-1.12 11/29/2019
[   10.005395] pstate: 00400005 (nzcv daif +PAN -UAO BTYPE=--)
[   10.010957] pc : string+0x50/0x100
[   10.014346] lr : vsnprintf+0x160/0x750
[   10.018081] sp : ffff800012b4b760
[   10.021381] x29: ffff800012b4b760 x28: 000000000000000c 
[   10.026679] x27: ffff8000113610d5 x26: ffff8000113610d5 
[   10.031977] x25: 0000000000000020 x24: 0000000000000000 
[   10.037275] x23: 00000000ffffffe8 x22: ffff800010f8e628 
[   10.042572] x21: ffff800012b4b8f0 x20: 0000000000000000 
[   10.047870] x19: 0000000000000000 x18: 00000000fffffffc 
[   10.053167] x17: 000000000000002d x16: 0000000000000001 
[   10.058465] x15: 0000000000000020 x14: 0000000000000000 
[   10.063762] x13: 0000000000000000 x12: 071c71c71c71c71c 
[   10.069060] x11: 00000000ffffff76 x10: ffff800012b4b8f0 
[   10.074357] x9 : ffff8000109e97d8 x8 : 00000000ffffffff 
[   10.079655] x7 : 000000000000000b x6 : 0000000000000000 
[   10.084952] x5 : 0000000000000000 x4 : 0000000000000000 
[   10.090250] x3 : ffff0a00ffffff04 x2 : 0000000000004006 
[   10.095547] x1 : ffffffffffffffff x0 : 000000000000000c 
[   10.100845] Call trace:
[   10.103280]  string+0x50/0x100
[   10.106321]  vsnprintf+0x160/0x750
[   10.109711]  devm_kvasprintf+0x5c/0xb4
[   10.113446]  devm_kasprintf+0x54/0x60
[   10.117096]  __devm_ioremap_resource+0xdc/0x1a0
[   10.121613]  devm_ioremap_resource+0x14/0x20
[   10.125871]  acpi_get_pmu_hw_inf.isra.0+0x84/0x15c
[   10.130648]  acpi_pmu_dev_add+0xbc/0x21c
[   10.134558]  acpi_ns_walk_namespace+0x16c/0x1e4
[   10.139075]  acpi_walk_namespace+0xb4/0xfc
[   10.143157]  xgene_pmu_probe_pmu_dev+0x7c/0xe0
[   10.147586]  xgene_pmu_probe.part.0+0x2c0/0x310
[   10.152103]  xgene_pmu_probe+0x54/0x64
[   10.155839]  platform_drv_probe+0x60/0xb4
[   10.159835]  really_probe+0xe8/0x4a0
[   10.163397]  driver_probe_device+0xe4/0x100
[   10.167566]  device_driver_attach+0xcc/0xd4
[   10.171736]  __driver_attach+0xb0/0x17c
[   10.175558]  bus_for_each_dev+0x6c/0xb0
[   10.179380]  driver_attach+0x30/0x40
[   10.182942]  bus_add_driver+0x154/0x250
[   10.186764]  driver_register+0x84/0x140
[   10.190586]  __platform_driver_register+0x54/0x60
[   10.195278]  xgene_pmu_driver_init+0x28/0x34
[   10.199535]  do_one_initcall+0x40/0x204
[   10.203358]  do_initcalls+0x104/0x144
[   10.207007]  kernel_init_freeable+0x198/0x210
[   10.211352]  kernel_init+0x20/0x12c
[   10.214827]  ret_from_fork+0x10/0x18
[   10.218391] Code: 91000400 110004e1 eb08009f 540000c0 (38646846) 
[   10.224484] ---[ end trace f08c10566496a703 ]---
[   10.229165] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   10.236815] SMP: stopping secondary CPUs
[   10.241945] Kernel Offset: 0x40000 from 0xffff800010000000
[   10.247416] PHYS_OFFSET: 0x80000000
[   10.250892] CPU features: 0x240002,20802008
[   10.255061] Memory Limit: none
[   10.258107] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Comment 1 Paul Whalen 2020-08-31 15:36:09 UTC
Created attachment 1713174 [details]
fulll boot log

Comment 2 Paul Whalen 2020-08-31 16:56:57 UTC
Last working kernel was 5.8.2-300.fc33.aarch64

5.8.3-300.fc33.aarch64 panics on boot.

Comment 3 Mark Salter 2020-09-02 01:10:04 UTC
Created attachment 1713385 [details]
Fix uninitialized variable in xgene PMU driver

A recent v5.9-rc1 patch uncovered a long standing bug in xgene PMU driver. This patche initializes the resource struct so that later reference to a bad pointer is avoided.

Comment 4 Mark Salter 2020-09-02 01:10:45 UTC
I'll send a patch upstream tomorrow.

Comment 5 Peter Robinson 2020-09-02 13:36:09 UTC
Patch pushed to 5.8.x for F-33/32/31.

Thanks for the patch Mark.

Comment 6 Paul Whalen 2020-09-02 15:30:06 UTC
Proposing as a blocker for F33 beta, this greatly inhibits testing on aarch64.

Comment 7 Paul Whalen 2020-09-02 18:05:49 UTC
Affects any device that uses the X-Gene PMU driver, not just the Ampere eMag.

Comment 8 František Zatloukal 2020-09-04 08:09:13 UTC
Accepted as Beta Blocker per voting in https://pagure.io/fedora-qa/blocker-review/issue/59 .

Bug hinders execution of required Beta test plans or dramatically reduces test coverage on aarch64.

Comment 9 Paul Whalen 2020-09-04 13:46:14 UTC
5.8.6-301.fc33.aarch64 boots as expected on the emag. 

Thanks again Mark.

Comment 10 Fedora Update System 2020-09-08 16:58:42 UTC
FEDORA-2020-5081eec059 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5081eec059

Comment 11 Fedora Update System 2020-09-08 17:04:36 UTC
FEDORA-2020-5081eec059 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 Peter Robinson 2020-10-07 08:32:03 UTC
This isn't properly fixed, there's a new fix headed upstream for 5.10:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/core&id=a76b8236edcf

Comment 13 Fedora Blocker Bugs Application 2020-10-07 08:35:18 UTC
Proposed as a Blocker for 33-final by Fedora user pbrobinson using the blocker tracking app because:

 Issues on enterprise aarch64 Ampete eMAG systems including the HW we use for the builders.

Comment 14 Fedora Update System 2020-10-08 11:42:31 UTC
FEDORA-2020-9664e2f1d2 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2

Comment 15 Fedora Update System 2020-10-08 22:19:28 UTC
FEDORA-2020-9664e2f1d2 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-9664e2f1d2`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 16 Fedora Update System 2020-10-12 21:57:04 UTC
FEDORA-2020-9664e2f1d2 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 17 Adam Williamson 2020-10-13 17:11:39 UTC
The update apparently wasn't marked as fixing the bug; can we close it or is something else needed? Thanks!

Comment 18 Peter Robinson 2020-10-13 18:03:42 UTC
(In reply to Adam Williamson from comment #17)
> The update apparently wasn't marked as fixing the bug; can we close it or is
> something else needed? Thanks!

Which update? I updated to a newer more robust fix that is landing upstream in 5.10 as part of the 5.8.14 kernel, it seems the changelog was trimmed. So IMO this can be closed.

* Wed Oct  7 2020 Peter Robinson <pbrobinson>
- Fix aarch64 boot crash on BTI capable systems
- Fix boot crash on aarch64 Ampere eMAG systems (rhbz #1874117)
 
* Thu Oct  1 12:09:16 CDT 2020 Justin M. Forbes <jforbes> - 5.8.13-300
- Linux v5.8.13

Comment 19 Adam Williamson 2020-10-13 20:49:50 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2020-9664e2f1d2 - #c16 above says it was pushed to stable. That was the 5.8.14-300 update. So if you think that fixed it, let's go ahead and close.


Note You need to log in before you can comment on or make changes to this bug.