Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1958474
Summary: | dnf update causes error: The futex facility returned an unexpect | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ted Sluis <ted.sluis> | ||||||||
Component: | containers-common | Assignee: | Lokesh Mandvekar <lsm5> | ||||||||
Status: | ON_QA --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | rawhide | CC: | acui, aoliva, arjun.is, bbaude, clems.verna, codonell, container-sig, debarshir, dj, dmach, dwalsh, fedoraproject, fweimer, gscrivan, jmracek, jnovy, jrohel, law, lhrazky, lsm5, mblaha, mcermak, mfabian, mhatina, mheon, packaging-team-maint, patrick, paul.0000.black, pehunt, pfrankli, pkratoch, rh.container.bot, rpm-software-management, rth, santiago, sipoyare, vmukhame, walters | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | armv7hl | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | Type: | Bug | |||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Ted Sluis
2021-05-08 07:17:31 UTC
Created attachment 1781496 [details]
journalctl
Added journalctl. The dnf core dump looks related to /usr/lib/libc-2.33.so
Seems more related to dnf than the container image itself. Moving to the dnf component for investigation I've tried the reproducer on x86_64 and didn't reproduce. Ted, you don't mention it, is this only reproducible on armv7? If so, could we perhaps try to bisect which package upgrade is causing this in the image? Seems like a good way to narrow this down. I assume rebuilding the image is required for this? Clement? Ted, is it possible to provide full backtrace from the coredump? Yes, this is only on armv7 (Raspberry 2B). On aarch64 and adm64 (x86_64) it works fine. Full backtrace from the coredump. Is this were you looking for? [root@fed157 ~]# coredumpctl list --since=today TIME PID UID GID SIG COREFILE EXE SIZE Tue 2021-05-11 06:52:42 CEST 29515 0 0 SIGABRT present /usr/bin/python3.9 4.5M [root@fed157 ~]# coredumpctl info 29515 PID: 29515 (dnf) UID: 0 (root) GID: 0 (root) Signal: 6 (ABRT) Timestamp: Tue 2021-05-11 06:52:39 CEST (6h ago) Command Line: /usr/bin/python3 /usr/bin/dnf update Executable: /usr/bin/python3.9 Control Group: /machine.slice/libpod-1f0b143b821f203098e9c9809e86c5762e7f0c02d44cc312481dd1aa51ae31a8.scope/container Unit: libpod-1f0b143b821f203098e9c9809e86c5762e7f0c02d44cc312481dd1aa51ae31a8.scope Slice: machine.slice Boot ID: 9471c84820ac4a2ea7fec8ddd428ed84 Machine ID: 86fe2a8ab05c43fbaf7baa408b3ea19a Hostname: 1f0b143b821f Storage: /var/lib/systemd/coredump/core.dnf.0.9471c84820ac4a2ea7fec8ddd428ed84.29515.1620708759000000.zst (present) Disk Size: 4.5M Message: Process 29515 (dnf) of user 0 dumped core. I will attach core.dnf.0.9471c84820ac4a2ea7fec8ddd428ed84.29515.1620708759000000.zst Created attachment 1781994 [details]
Coredump /var/lib/systemd/coredump/core.dnf.0.9471c84820ac4a2ea7fec8ddd428ed84.29515.1620708759000000.zst
That is not useful. I'm not sure if the .zst would work at all, but I'm sure I wouldn't be able to load the coredump on a different architecture anyway. To print the backtrace, run: coredumpctl debug 29515 Youll need gdb installed for that, it should open the gdb console. Then type: backtrace full And paste the full output of that, please. Created attachment 1782073 [details]
coredump dnf
I installed gdb and run coredumpctl debug 29515
Next I installed over 200 gdb dependency packages, as recommended in the gdb console.
Then I run coredumpctl debug 29515 again and saved all the output in coredump-dnf.txt
I hope it is useful, but I have my doubts ;-)
Thanks. The backtrace is not useless, though it doesn't point to the culprit. Pasting the relevant part: #0 0xb6b810d4 in __libc_signal_restore_set (set=0xbec8096c) at ../sysdeps/unix/sysv/linux/internal-signals.h:105 _a1 = 0 _nr = 175 _a3tmp = 0 _a1tmp = 2 _a3 = 0 _a4tmp = 8 _a2tmp = -1094186636 _a2 = -1094186636 _a4 = 8 #1 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:47 set = {__val = {0, 0, 3014407344, 0, 3067418392, 3014407344, 3011337064, 1, 3067293352, 0, 972469504, 3067415732, 3013791856, 3011295096, 3067423660, 3011295096, 0, 3010864904, 3010864904, 3010865244, 3014440454, 20697120, 3010865224, 3010864904, 20700944, 12, 3067353600, 3069561466, 0, 20697172, 3010865212, 20700944}} pid = <optimized out> tid = <optimized out> ret = 0 #2 0xb6b69fbc in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x1645ea0, sa_sigaction = 0x1645ea0}, sa_mask = {__val = {23125480, 2980055192, 3200781172, 80, 972469504, 0, 23350660, 3200781172, 24649648, 23147584, 3042883392, 23147584, 30, 23125480, 2980055192, 3200781172, 1, 0, 3042885912, 23125480, 80, 23350592, 3200781236, 274877907, 0, 23125480, 1868850534, 1831756146, 1869771369, 1768697458, 1702061428, 1919252082}}, sa_flags = 980184622, sa_restorer = 0x1003038} sigs = {__val = {32, 0, 0, 20700944, 3010864904, 3067350800, 20700944, 3010864904, 3063890032, 3200781128, 0, 3067419296, 0, 3200781128, 12, 3014407344, 0, 3067418392, 3014407344, 3067282508, 3069445700, 3, 3200781104, 3067315232, 0, 159096470, 3200781048, 3068762576, 3013792648, 972469504, 3015454144, 0}} #3 0xb6bbbd88 in fmemopen_seek (cookie=0x0, p=0xbec80974, w=<optimized out>) at fmemopen.c:112 np = <optimized out> c = 0x0 Backtrace stopped: previous frame identical to this frame (corrupt stack?) I haven't seen this before, it seems like badly corrupted memory (assuming the core dump is not damaged, but the top few frames already look very suspicious). The bug can be anywhere, dnf, anything in the container or even the host. I'm afraid given this is on arm, my options to help are limited. Can you try bisecting the RPM versions that changed between the two images (a working one and the broken one) by starting with the good one and upgrading to the versions of the broken image gradually. If you find installing newer versions of some RPMs introduces this crash, it's likely a bug in those. If not, it'll be a bug somewhere in the image creation? Thanks for your update. I am able to reproduced this issue on a fresh fedora34 OS (an other raspberry pi 2B, armv7hl) with fresh fedora34 armv7hl container image (created April 24, 2021). I experienced the same dnf update coredump issue. Previous versions of the fedora:34 container image are not available anymore, see https://registry.fedoraproject.org/repo/fedora/tags/ Not sure how to continue with this. Is it possible to re-direct this issue to the team that is responsible for the fedora:34 container image? (In reply to Ted Sluis from comment #9) > Not sure how to continue with this. Is it possible to re-direct this issue > to the team that is responsible for the fedora:34 container image? Good question. I tried to reassign back to the fedora-container-image component, but I don't see it on the list. Clement, can you take a look at this? The easiest way to narrow this down seems to be to bisect which rpm upgrade broke the image. Reassigned to fedora-container-image. All of our image builds are available in the buildsystem --> https://koji.fedoraproject.org/koji/packageinfo?packageID=26387 I can try to push a new update and see if that fix the issue, otherwise yes we can try to test older images to identify the root cause of the problem Just pushed a new fedora:34 image can you give it a try [cverna@localhost] $ podman images REPOSITORY TAG IMAGE ID CREATED SIZE registry.fedoraproject.org/fedora 34 3567369c6711 4 hours ago 184 MB I'm getting this as well. The most recent version I've tried is the armhfp variant of Fedora-Container-Base-34-20210616.0 from the link in comment 11. This is likely 32 bit related. I was trying to debug a 32 bit LLVM OOM in https://bugzilla.redhat.com/show_bug.cgi?id=1974927 Now if mock/koji just used podman/kubernetes directly, there would already be a 32 bit container to pull. But since there isn't I hacked up my own and pushed to quay.io/cgwalters/fedora-i686:34 from the current snapshot of https://kojipkgs.fedoraproject.org/repos/f34-build/latest/i386/ And it was then really easy to debug this, a quick strace from outside the container shows: 272619 futex_time64(0xf4cf0b28, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 6, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...> 272619 <... futex_time64 resumed>) = -1 EPERM (Operation not permitted) Which, "EPERM from random system calls" then immediately triggers my seccomp scars, and yep, $ podman run --security-opt seccomp=unconfined --rm -ti quay.io/cgwalters/fedora-i686:34 setarch i686 bash # dnf -y install cargo Then works fine. So...probably the podman seccomp filter needs to be updated to allow futex_time64. As for why this started just recently, I suspect glibc was updated to assume the system call exists and works. Perhaps https://github.com/bminor/glibc/commit/a3e7aead03d558e77fc8b9dc4d567b7bb8619545 ? Temporarily reassigning to glibc as a way to get them CC'd and discuss whether glibc needs to at least temporarily revert the use of the syscall. (Right sorry, the seccomp policy is actually in containers-common) (In reply to Colin Walters from comment #18) > Temporarily reassigning to glibc as a way to get them CC'd and discuss > whether glibc needs to at least temporarily revert the use of the syscall. So what's podman's position here? Will you apply the runc kludge (generic ENOSYS handling), or will we always be fighting these EPERM errors? If the former, what needs to happen before you can roll out generic ENOSYS handling? New koji builds with updated seccomp.json can be found here: f33: https://koji.fedoraproject.org/koji/buildinfo?buildID=1775245 f34: https://koji.fedoraproject.org/koji/buildinfo?buildID=1775182 I haven't yet submitted these to bodhi because I plan to add these together with podman v3.2.2 scheduled for release tomorrow. I think this is giuseppes call. futex_time64 is permitted now with: https://github.com/containers/common/pull/597 (In reply to Florian Weimer from comment #21) > So what's podman's position here? Will you apply the runc kludge (generic > ENOSYS handling), or will we always be fighting these EPERM errors? > > If the former, what needs to happen before you can roll out generic ENOSYS > handling? the issue is also fixed in containers/common. We switched the default to ENOSYS instead of EPERM. FEDORA-2021-bc6a62a2c5 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-bc6a62a2c5 FEDORA-2021-0c53d8738d has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-0c53d8738d FEDORA-2021-bc6a62a2c5 has been pushed to the Fedora 34 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-bc6a62a2c5` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-bc6a62a2c5 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-0c53d8738d has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-0c53d8738d` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-0c53d8738d See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-0c53d8738d has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-0c53d8738d FEDORA-2021-0c53d8738d has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-0c53d8738d` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-0c53d8738d See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-bc6a62a2c5 has been pushed to the Fedora 34 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2021-0c53d8738d has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-0c53d8738d FEDORA-2021-0c53d8738d has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-0c53d8738d` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-0c53d8738d See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. |