Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1813089 - aarch builder-specific failure with setrlimit (?)
Summary: aarch builder-specific failure with setrlimit (?)
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2020-03-12 22:49 UTC by Dave Love
Modified: 2023-09-12 03:43 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-25 11:55:44 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Output of strace gpg2 on aarch64 builder (19.22 KB, text/plain)
2020-03-16 00:29 UTC, Jakub Kadlčík
no flags Details
Output of strace gpg2 on x86_64 builder (11.33 KB, text/plain)
2020-03-16 00:30 UTC, Jakub Kadlčík
no flags Details

Description Dave Love 2020-03-12 22:49:47 UTC
Description of problem:

I'm seeing failures like this consistently for aarch64, but not for x86_64:

+ gpg2 --homedir /tmp/tmp.movxj1mnvg --no-default-keyring --quiet --yes --output /tmp/tmp.movxj1mnvg/gpgkey-BE07D9FD54809AB2C4B0FF5F63762CDA67E2F359.asc.gpg --dearmor /builddir/build/SOURCES/gpgkey-BE07D9FD54809AB2C4B0FF5F63762CDA67E2F359.asc
gpg: Fatal: can't disable core dumps: Operation not permitted

An example is https://copr.fedorainfracloud.org/coprs/loveshack/openconnect/build/1303581/

Comment 1 Jakub Kadlčík 2020-03-13 00:56:36 UTC
Hello Dave,
thank you for the report.

I tried to download the SRPM and build it locally in mock and it worked fine.
Seems like it is really a Copr issue. I will investigate it more in the morning.

Comment 2 Dave Love 2020-03-13 11:04:17 UTC
I should have mentioned that it may be connected with the copr move, as I'd previously built it in another project: https://copr.fedorainfracloud.org/coprs/loveshack/livhpc/build/1081587/

Comment 3 Jakub Kadlčík 2020-03-16 00:26:35 UTC
The easiest reproducer is to provision an aarch64 builder and do

[root@ip-172-30-2-18 ~]# mock -r epel-7-aarch64 --shell
<mock-chroot> sh-4.2# gpg2
gpg: Fatal: can't disable core dumps: Operation not permitted

I installed strace to see what is going on. This is IMHO the interesting part

getrlimit(RLIMIT_CORE, 0xffffdfae5c58)  = -1 EPERM (Operation not permitted)
setrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=0}) = -1 EPERM (Operation not permitted)

vs how the output looked like on x86_64 builder

getrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=RLIM64_INFINITY}) = 0
setrlimit(RLIMIT_CORE, {rlim_cur=0, rlim_max=RLIM64_INFINITY}) = 0

I tried to do `setenforce 0` on the builder and it didn't seem to help.

Comment 4 Jakub Kadlčík 2020-03-16 00:29:48 UTC
Created attachment 1670373 [details]
Output of strace gpg2 on aarch64 builder

Comment 5 Jakub Kadlčík 2020-03-16 00:30:33 UTC
Created attachment 1670374 [details]
Output of strace gpg2 on x86_64 builder

Comment 6 Jakub Kadlčík 2020-03-25 11:32:05 UTC
I am continuing to debug this issue and it seems to be systemd-nspawn related. Running mock
with just simple chroot works fine


[root@ip-172-30-2-210 ~]# mock -r epel-7-aarch64 --shell --isolation=simple
<mock-chroot> sh-4.2# gpg2
...
gpg: Go ahead and type your message ...

Comment 7 Jakub Kadlčík 2020-03-26 12:16:20 UTC
This is the most low-level reproducer I am able to come up with


[root@ip-172-30-2-210 ~]#  mock -r epel-7-aarch64 init --disable-plugin=tmpfs
[root@ip-172-30-2-210 ~]# systemd-nspawn -D /var/lib/mock/epel-7-aarch64/root/
Spawning container root on /var/lib/mock/epel-7-aarch64/root.
Press ^] three times within 1s to kill container.
-bash-4.2# ulimit -c unlimited
-bash: ulimit: core file size: cannot get limit: Operation not permitted
-bash-4.2#


It seems to me like either systemd-nspawn or EPEL7 bug ...
@msekletar can you please take a look at this if you know what is going on?

Comment 8 Pavel Raiskup 2020-04-14 08:09:31 UTC
I can not reproduce this on Fedora 32 now.  @frostyx, can you re-try?
kernel-5.6.3-300.fc32.x86_64
systemd-245.4-1.fc32.x86_64
qemu-user-static-4.2.0-7.fc32.x86_64

Comment 9 Pavel Raiskup 2020-04-14 08:21:59 UTC
Not even on F31, so I suppose we need to update copr builders
if anything and close this bug?

$ rpm -q kernel systemd qemu-user-static
kernel-5.3.15-300.fc31.x86_64
kernel-5.5.10-200.fc31.x86_64
systemd-243.8-1.fc31.x86_64
qemu-user-static-4.1.1-1.fc31.x86_64

Comment 10 Michal Sekletar 2020-04-14 18:58:16 UTC
> It seems to me like either systemd-nspawn or EPEL7 bug ...
> @msekletar can you please take a look at this if you know what is going on?

I was planning to spend some time debugging this, but since Pavel is reporting that it works on latest Fedora I figure that my assistance is not needed anymore.

Comment 11 Jakub Kadlčík 2020-04-14 20:24:21 UTC
Michal,
actually Pavel accidentally tried the reproducer on x86_64 machine instead of aarch64,
so, unfortunately, it is not fixed yet. On x86_64 it worked from the beginning.

Comment 12 Michal Sekletar 2020-05-03 14:54:10 UTC
(In reply to Jakub Kadlčík from comment #11)
> Michal,
> actually Pavel accidentally tried the reproducer on x86_64 machine instead
> of aarch64,
> so, unfortunately, it is not fixed yet. On x86_64 it worked from the
> beginning.

I've tried to debug this further but unfortunately without much success. It couldn't figure out why is kernel returning EPERM to userspace. Debugging production kernel w/o recompiling it is very frustrating experience (due to aggressive optimisations and inlining). To be quite honest, I am not going to invest hours of my time into doing that. Please file a bug against kernel.

Comment 13 David Woodhouse 2020-05-04 16:18:21 UTC
FWIW I was only seeing this in EPEL7 aarch64 — presumably because newer glibc will actually use the prlimit system call, which *does* work. I've worked around it for now by changing the source code (of ocserv) to explicitly use prlimit() instead of relying on glibc to implement [gs]etrlimit() using SYS_prlimit.

Comment 14 Pavel Raiskup 2020-05-05 06:29:49 UTC
Also reported upstream:
https://pagure.io/copr/copr/issue/1368

Comment 15 David Woodhouse 2020-05-05 09:56:29 UTC
It looks like we're using systemd-nspawn, which has its own list of system calls to filter. That may well be where the problem is.

Comment 16 Jakub Kadlčík 2020-05-11 09:24:15 UTC
Per Michal's recommendation, I am moving this bug to Fedora kernel component.
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1813089#c7 for a reproducer.

Comment 17 Jakub Kadlčík 2020-07-02 21:47:07 UTC
Hello John,
just gently pinging to ask if we have any ETA on this?

Thank you

Comment 18 Jakub Kadlčík 2020-08-17 13:32:54 UTC
One more ping,
pretty please, do we have any ETA?

Thank you again.

Comment 19 Pavel Raiskup 2020-08-31 06:21:17 UTC
(In reply to David Woodhouse from comment #13)
> FWIW I was only seeing this in EPEL7 aarch64 — presumably because newer
> glibc will actually use the prlimit system call, which *does* work. I've
> worked around it for now by changing the source code (of ocserv) to
> explicitly use prlimit() instead of relying on glibc to implement
> [gs]etrlimit() using SYS_prlimit.

(In reply to David Woodhouse from comment #15)
> It looks like we're using systemd-nspawn, which has its own list of system
> calls to filter. That may well be where the problem is.

Indeed, this works:

    int main()
    {
        struct rlimit lim = {0, 0};
        if (prlimit(0, RLIMIT_CORE, &lim, NULL)) {
            perror("setrlimit");
        }
    }

.. while this doesn't:

    int main()
    {
        struct rlimit lim = {0, 0};
        if (setrlimit(RLIMIT_CORE, &lim)) {
            perror("setrlimit");
        }
    }

... on the Fedora aarch64 host system directly, both variants work.

Michal, I'm not sure.  If the call was really blocked by systemd,
would the errno be EPERM?  What made you think this is kernel issue
in particular?

Comment 20 Ben Cotton 2020-11-03 16:28:25 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Ben Cotton 2020-11-24 17:13:09 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 22 Pavel Raiskup 2020-11-25 11:55:44 UTC
Ok, this seems to work fine with
$ rpm -q systemd kernel mock
systemd-246.6-3.fc33.aarch64
package kernel is not installed
mock-2.6-1.fc33.noarch

Comment 23 Pavel Raiskup 2021-01-14 09:11:34 UTC
There's also a possibility to disable systemd-nspawn in copr by switching
to --isolation=simple.  See the chroot configuration web-UI page.

Comment 24 Red Hat Bugzilla 2023-09-12 03:43:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.