Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1813089
Summary: | aarch builder-specific failure with setrlimit (?) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dave Love <dave.love> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 33 | CC: | airlied, bskeggs, dan, dwmw2, hdegoede, ichavero, itamar, jarodwilson, jeremy, jfeeney, jglisse, jkadlcik, jlinton, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, mlangsdo, msekleta, pbrobinson, praiskup, steved | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | aarch64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-11-25 11:55:44 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 245418 | ||||||||
Attachments: |
|
Description
Dave Love
2020-03-12 22:49:47 UTC
Hello Dave, thank you for the report. I tried to download the SRPM and build it locally in mock and it worked fine. Seems like it is really a Copr issue. I will investigate it more in the morning. I should have mentioned that it may be connected with the copr move, as I'd previously built it in another project: https://copr.fedorainfracloud.org/coprs/loveshack/livhpc/build/1081587/ The easiest reproducer is to provision an aarch64 builder and do [root@ip-172-30-2-18 ~]# mock -r epel-7-aarch64 --shell <mock-chroot> sh-4.2# gpg2 gpg: Fatal: can't disable core dumps: Operation not permitted I installed strace to see what is going on. This is IMHO the interesting part getrlimit(RLIMIT_CORE, 0xffffdfae5c58) = -1 EPERM (Operation not permitted) setrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=0}) = -1 EPERM (Operation not permitted) vs how the output looked like on x86_64 builder getrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=RLIM64_INFINITY}) = 0 setrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=RLIM64_INFINITY}) = 0 I tried to do `setenforce 0` on the builder and it didn't seem to help. Created attachment 1670373 [details]
Output of strace gpg2 on aarch64 builder
Created attachment 1670374 [details]
Output of strace gpg2 on x86_64 builder
I am continuing to debug this issue and it seems to be systemd-nspawn related. Running mock with just simple chroot works fine [root@ip-172-30-2-210 ~]# mock -r epel-7-aarch64 --shell --isolation=simple <mock-chroot> sh-4.2# gpg2 ... gpg: Go ahead and type your message ... This is the most low-level reproducer I am able to come up with [root@ip-172-30-2-210 ~]# mock -r epel-7-aarch64 init --disable-plugin=tmpfs [root@ip-172-30-2-210 ~]# systemd-nspawn -D /var/lib/mock/epel-7-aarch64/root/ Spawning container root on /var/lib/mock/epel-7-aarch64/root. Press ^] three times within 1s to kill container. -bash-4.2# ulimit -c unlimited -bash: ulimit: core file size: cannot get limit: Operation not permitted -bash-4.2# It seems to me like either systemd-nspawn or EPEL7 bug ... @msekletar can you please take a look at this if you know what is going on? I can not reproduce this on Fedora 32 now. @frostyx, can you re-try? kernel-5.6.3-300.fc32.x86_64 systemd-245.4-1.fc32.x86_64 qemu-user-static-4.2.0-7.fc32.x86_64 Not even on F31, so I suppose we need to update copr builders if anything and close this bug? $ rpm -q kernel systemd qemu-user-static kernel-5.3.15-300.fc31.x86_64 kernel-5.5.10-200.fc31.x86_64 systemd-243.8-1.fc31.x86_64 qemu-user-static-4.1.1-1.fc31.x86_64 > It seems to me like either systemd-nspawn or EPEL7 bug ...
> @msekletar can you please take a look at this if you know what is going on?
I was planning to spend some time debugging this, but since Pavel is reporting that it works on latest Fedora I figure that my assistance is not needed anymore.
Michal, actually Pavel accidentally tried the reproducer on x86_64 machine instead of aarch64, so, unfortunately, it is not fixed yet. On x86_64 it worked from the beginning. (In reply to Jakub Kadlčík from comment #11) > Michal, > actually Pavel accidentally tried the reproducer on x86_64 machine instead > of aarch64, > so, unfortunately, it is not fixed yet. On x86_64 it worked from the > beginning. I've tried to debug this further but unfortunately without much success. It couldn't figure out why is kernel returning EPERM to userspace. Debugging production kernel w/o recompiling it is very frustrating experience (due to aggressive optimisations and inlining). To be quite honest, I am not going to invest hours of my time into doing that. Please file a bug against kernel. FWIW I was only seeing this in EPEL7 aarch64 — presumably because newer glibc will actually use the prlimit system call, which *does* work. I've worked around it for now by changing the source code (of ocserv) to explicitly use prlimit() instead of relying on glibc to implement [gs]etrlimit() using SYS_prlimit. Also reported upstream: https://pagure.io/copr/copr/issue/1368 It looks like we're using systemd-nspawn, which has its own list of system calls to filter. That may well be where the problem is. Per Michal's recommendation, I am moving this bug to Fedora kernel component. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1813089#c7 for a reproducer. Hello John, just gently pinging to ask if we have any ETA on this? Thank you One more ping, pretty please, do we have any ETA? Thank you again. (In reply to David Woodhouse from comment #13) > FWIW I was only seeing this in EPEL7 aarch64 — presumably because newer > glibc will actually use the prlimit system call, which *does* work. I've > worked around it for now by changing the source code (of ocserv) to > explicitly use prlimit() instead of relying on glibc to implement > [gs]etrlimit() using SYS_prlimit. (In reply to David Woodhouse from comment #15) > It looks like we're using systemd-nspawn, which has its own list of system > calls to filter. That may well be where the problem is. Indeed, this works: int main() { struct rlimit lim = {0, 0}; if (prlimit(0, RLIMIT_CORE, &lim, NULL)) { perror("setrlimit"); } } .. while this doesn't: int main() { struct rlimit lim = {0, 0}; if (setrlimit(RLIMIT_CORE, &lim)) { perror("setrlimit"); } } ... on the Fedora aarch64 host system directly, both variants work. Michal, I'm not sure. If the call was really blocked by systemd, would the errno be EPERM? What made you think this is kernel issue in particular? This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. Ok, this seems to work fine with $ rpm -q systemd kernel mock systemd-246.6-3.fc33.aarch64 package kernel is not installed mock-2.6-1.fc33.noarch There's also a possibility to disable systemd-nspawn in copr by switching to --isolation=simple. See the chroot configuration web-UI page. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |