Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1390937

Summary: mem region issue with aarch64 NUMA host (and potentially other devices)
Product: [Fedora] Fedora Reporter: Peter Robinson <pbrobinson>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 25CC: gansalmon, gmarr, ichavero, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, pbrobinson, rrichter
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedFreezeException
Fixed In Version: kernel-4.8.6-300.fc25 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-08 04:59:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 1277290    

Description Peter Robinson 2016-11-02 09:46:32 UTC
Some hosts are crashing during install due to memory in specific regions being stomped on (or something like that).

Fedora thread:
https://lists.fedoraproject.org/archives/list/arm@lists.fedoraproject.org/thread/MBUUFRQ2RSW2OT4VBN22VDMRF5VTNCU5/

Upstream thread:
https://www.spinics.net/lists/arm-kernel/msg535191.html

Comment 1 Fedora Update System 2016-11-02 10:42:51 UTC
kernel-4.8.6-300.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-01c178cf9b

Comment 2 Fedora Blocker Bugs Application 2016-11-02 10:47:14 UTC
Proposed as a Freeze Exception for 25-final by Fedora user pbrobinson using the blocker tracking app because:

 Issues installing on particular types of aarch64 platforms (NUMA, potentially others). As an AltArch this isn't a blocker but is a exception.

Comment 3 Fedora Update System 2016-11-02 16:54:56 UTC
kernel-4.8.6-300.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-01c178cf9b

Comment 4 Robert Richter (Marvell) 2016-11-02 17:48:57 UTC
(In reply to Fedora Update System from comment #1)
> kernel-4.8.6-300.fc25 has been submitted as an update to Fedora 25.
> https://bodhi.fedoraproject.org/updates/FEDORA-2016-01c178cf9b

I created an aarch64 build from .src.rpm here:

 http://koji.fedoraproject.org/koji/buildinfo?buildID=814228

A dual node ThunderX system boots fine now:

 4.8.6-300.201611020438.fc25.aarch64

Thanks,

-Robert

Comment 5 Fedora Update System 2016-11-03 18:30:37 UTC
kernel-4.8.6-300.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 6 Geoffrey Marr 2016-11-08 01:01:58 UTC
Discussed during the 2016-11-07 blocker review meeting: [1]

The decision was made to classify this bug as an AcceptedFreezeException.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-11-07/f25-blocker-review.2016-11-07-17.01.txt

Comment 7 Fedora Update System 2016-11-08 04:59:01 UTC
kernel-4.8.6-300.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 8 Robert Richter (Marvell) 2017-02-14 16:23:20 UTC
Fixed for v4.11:

 f073bdc51771 mm: don't dereference struct page fields of invalid pages
 6d526ee26ccd arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA

-Robert

Comment 9 Robert Richter (Marvell) 2017-05-22 15:17:32 UTC
The temporary fix:

 arm64: mm: Fix memmap to be initialized for the entire section

should be reverted in all kernel from 4.11:

Rawhide: a3c290f92009d7c4325e3abbebd277b99b60e707
F26: 67b5ed5428fa24094dc3314060405a47d7c5fb41