Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1920183 - Overeager OOM kills on 5.10 kernels on 32bit arm with lpae
Summary: Overeager OOM kills on 5.10 kernels on 32bit arm with lpae
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2021-01-25 17:36 UTC by Kevin Fenzi
Modified: 2021-06-07 07:59 UTC (History)
24 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg from f33 + 5.10.10 + 40GB mem + direct-boot (111.80 KB, text/plain)
2021-01-25 17:36 UTC, Kevin Fenzi
no flags Details
libvirt xml for f34 32bit arm guest vm (5.58 KB, text/plain)
2021-04-30 16:22 UTC, Kevin Fenzi
no flags Details

Description Kevin Fenzi 2021-01-25 17:36:52 UTC
Created attachment 1750648 [details]
dmesg from f33 + 5.10.10 + 40GB mem + direct-boot

Fedora 32bit arm builders were: 

Fedora 32, lpae kernel, 5.6.x 24GB ram, direct kernel boot and were operating fine.

On upgrading to: 

Fedora 33, lpae kernel, 5.10.x, any amounts of ram from 8->40GB, uefi or direct kernel boot 

kojid gets OOM killed during builds that use a lot of resources. python3.8 packager with tests enabled or gcc builds both show this issue. There may be others. The build goes along and then when doing tests kojid is OOM killed. ;( 

We have also tried Fedora 32 userspace with 5.10.x kernel. 

Sadly, there was a configuration change with the fedora 5.7 kernels, and all fedora kernels from 5.7 to 5.10.9 have HIGHPTE set, which causes 32bit arm lpae guests to 'pause', so we can't use any of them. ;( 

Basically it seems like between 5.6.x and 5.10.x the OOM handling got more aggressive or the 32bit lpae case changed to use the memory it has less effectively. 

I've tried also setting highmem_is_dirtyable => 1 and I tried 'lowmem_reserve_ratio' => 1 to no real effect. 

Will attach dmesg from the Fedora 33 + 5.10.10 + 40GB memory + direct boot case

Comment 1 Kevin Fenzi 2021-02-01 23:45:06 UTC
Paul: were you able to reproduce this? Or would you like me to gather more info? if so what?

Comment 2 Paul Whalen 2021-02-03 18:58:45 UTC
(In reply to Kevin Fenzi from comment #1)
> Paul: were you able to reproduce this? Or would you like me to gather more
> info? if so what?

I have not been able to reproduce running the builds outside of Koji, could we set up a builder in staging to try there?

Comment 3 Kevin Fenzi 2021-02-05 20:44:33 UTC
ok. I have setup buildvm-a32-01.stg.iad2.fedoraproject.org with f33 uefi install with 5.10.11-200.fc33.armv7hl+lpae.

You will need to: 
* go to https://admin.stg.fedoraproject.org/accounts and use the 'forgot password' link to reset your password.
* login and enroll a 2fa token and update your ssh keys (if out of date). 
* setup ssh to use our bastion hosts ( https://docs.pagure.org/infra-docs/sysadmin-guide/sops/sshaccess.html )
* ssh to buildvm-a32-01.stg.iad2.fedoraproject.org and you should be able to sudo there with your passwd/2fa token. 

I started a python3.8 build on it: 
https://koji.stg.fedoraproject.org/koji/taskinfo?taskID=90112693

Let me know if there's anything I can further help with.

Comment 4 Kevin Fenzi 2021-04-12 22:02:04 UTC
I did a clean fresh f34 install on buildvm-a32-01.stg and The problem persists. 

I upgraded that vm to 5.12.0-0.rc7.189.fc35.armv7hl+lpae and the problem persists.

Let me know if there is anything I can do to move this along or more info I can gather.

Comment 5 Kevin Fenzi 2021-04-13 18:44:05 UTC
Right now buildvm-a32-01.stg.iad2.fedoraproject.org is f34 and enabled in stg-koji as the only 32 bit arm builder. 

So, I have been using: 

fedpkg clone -a python3.8
cd python3.8
fedpkg srpm
stg-koji build  --scratch f33 python3.8-3.8.9-1.fc35.src.rpm --arch-override armv7hl

The build should go to that builder, get most of the way done and OOM kill kojid and restart.

Comment 6 Kevin Fenzi 2021-04-30 16:22:23 UTC
Created attachment 1777845 [details]
libvirt xml for f34 32bit arm guest vm

Here's the xml for libvirt for the f34 test vm I last duplicated the problem on.

Comment 7 Jeremy Linton 2021-05-10 15:38:50 UTC
Also, I'm wondering now if annobin is causing the problem.

Comment 8 Kevin Fenzi 2021-06-02 21:43:04 UTC
Current status: 

In my testing last week, it seemed the problem was gone in the 5.12.x kernels (or at least much much more rare). 

Based on that, I moved all the builders to f34 and newest kernel. 

However, the issue may not be solved after all. I'd like to leave this open for feedback from users to see when/what/if this still happens to armv7 builds. 
Thanks.

Comment 9 Miro Hrončok 2021-06-07 07:59:39 UTC
This libreoffice build:

https://koji.fedoraproject.org/koji/taskinfo?taskID=69447907

Buildroots:

/var/lib/mock/f35-python-27616570-3666659
/var/lib/mock/f35-python-27620380-3666659
/var/lib/mock/f35-python-27622929-3666659
/var/lib/mock/f35-python-27623229-3666659
/var/lib/mock/f35-python-27623531-3666659

Total time 	4:16:36
Task time 	0:16:19


%prep:

Initialized empty Git repository in /builddir/build/BUILD/libreoffice-7.1.3.2/.git/
+ /usr/bin/git config user.name rpm-build
+ /usr/bin/git config user.email '<rpm-build>'
+ /usr/bin/git config gc.auto 0
+ /usr/bin/git add --force .
+ /usr/bin/git commit --allow-empty -a --author 'rpm-build <rpm-build>' -m 'libreoffice-7.1.3.2 base'
/var/tmp/rpm-tmp.K9GR5j: line 68: 25966 Killed                  /usr/bin/git commit --allow-empty -a --author "rpm-build <rpm-build>" -m "libreoffice-7.1.3.2 base"
error: Bad exit status from /var/tmp/rpm-tmp.K9GR5j (%prep)


We try again in https://koji.fedoraproject.org/koji/taskinfo?taskID=69473607

Until that builds, we cannot merge the Python 3.10 side tag :/


Note You need to log in before you can comment on or make changes to this bug.