Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1747933 - systemd does not work with podman and cgroupsV2
Summary: systemd does not work with podman and cgroupsV2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: crun
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Giuseppe Scrivano
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-02 09:50 UTC by Lukas Slebodnik
Modified: 2019-09-19 14:30 UTC (History)
9 users (show)

Fixed In Version: crun-0.9.1-1.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-19 14:30:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Lukas Slebodnik 2019-09-02 09:50:15 UTC
Description of problem:
systemd does not work with podman in fedora 31 due to switching to cgroupsV2

Version-Release number of selected component (if applicable):
sh$ rpm -q podman crun
podman-1.5.1-2.17.dev.gitce64c14.fc31.x86_64
crun-0.8-1.fc31.x86_64

How reproducible:
Deterministic

Steps to Reproduce:
1. dnf install -y podman
2. podman pull registry.access.redhat.com/rhel7-init
3. 
podman run --name test -d registry.access.redhat.com/rhel7-init:latest && sleep 10 && podman exec test systemctl status

Actual results:
sh# podman run --name test -d registry.access.redhat.com/rhel7-init:latest && sleep 10 && podman exec test systemctl status
c8567461948439bce72fad3076a91ececfb7b14d469bfa5fbc32c6403185beff
Failed to get D-Bus connection: Operation not permitted
Error: non zero exit code: 1: OCI runtime error

Expected results:
sh# podman run --name test -d registry.access.redhat.com/rhel7-init:latest && sleep 10 && podman exec test systemctl status
6cda1824877a36e019c80528048d24ee5152c38bcb0cec7625f863669dd2881a

● 6cda1824877a
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Mon 2019-09-02 09:47:06 UTC; 10s ago
   CGroup: /machine.slice/libpod-6cda1824877a36e019c80528048d24ee5152c38bcb0cec7625f863669dd2881a.scope
           ├─ 1 /sbin/init
           ├─29 systemctl status
           └─system.slice
             ├─systemd-journald.service
             │ └─18 /usr/lib/systemd/systemd-journald
             └─dbus.service
               └─26 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Additional info:

Workaround is to disable cgroupsV2 with kernel command line parameter(systemd.unified_cgroup_hierarchy=0)

Comment 1 Daniel Walsh 2019-09-02 11:12:01 UTC
Lukas, any idea what is beling blocked?

Could you try with a --privileged container, to see if it is security blocking the creation?

Comment 2 Giuseppe Scrivano 2019-09-02 11:17:57 UTC
It requires support from systemd as well.  I don't think the version shipped with rhel7 has cgroups v2 support.

Could you try with a rhel8 image?

Also, exec with systemd containers is known to be broken on cgroups v2.  On cgroups v2 it is not possible to join a parent node, since systemd modifies the cgroup hierarchy, the exec will fail with "Device or resource busy".  I am not sure yet how to solve this issue

Comment 3 Lukas Slebodnik 2019-09-02 12:08:14 UTC
(In reply to Giuseppe Scrivano from comment #2)
> It requires support from systemd as well.  I don't think the version shipped
> with rhel7 has cgroups v2 support.
> 
> Could you try with a rhel8 image?
> 
> Also, exec with systemd containers is known to be broken on cgroups v2.  On
> cgroups v2 it is not possible to join a parent node, since systemd modifies
> the cgroup hierarchy, the exec will fail with "Device or resource busy".  I
> am not sure yet how to solve this issue


yep,

sh# podman run --name test -d registry.access.redhat.com/ubi8-init:latest && sleep 10 && podman exec test systemctl status
e01001c8e5513b603dc8d752a22789f8d945f27367ed336f4e1b151eec0e5253
Error: writing file '/sys/fs/cgroup//machine.slice/libpod-e01001c8e5513b603dc8d752a22789f8d945f27367ed336f4e1b151eec0e5253.scope/cgroup.procs': Device or resource busy: OCI runtime error

But that's quite problematic if new podman cannot run some older (rhel7/fedora/ random image from net)
with systemd. People will either disable cgroupsV2 or even will not use podman at all.

Comment 4 Lukas Slebodnik 2019-09-02 12:35:11 UTC
(In reply to Daniel Walsh from comment #1)
> Lukas, any idea what is beling blocked?
> 
> Could you try with a --privileged container, to see if it is security
> blocking the creation?

I think Giuseppe already provided an explanation but just for the record.
There is not any difference with --privileged.

Comment 5 Giuseppe Scrivano 2019-09-02 12:47:47 UTC
> But that's quite problematic if new podman cannot run some older
> (rhel7/fedora/ random image from net)
> with systemd. People will either disable cgroupsV2 or even will not use
> podman at all.

the issue only happens when the container payload tries to access cgroups v1.  It is a known issue, for example cgroups v2 adoption was/is also blocked by the Java VM that reads cgroups stats.

There is not really much Libpod can do.  Cgroups are a kernel interface, so either the container payload supports cgroups v2 or you'll need to use cgroups v1.

Comment 6 Lukas Slebodnik 2019-09-02 12:52:39 UTC
Please enhance documentation (details about systemd would be good as well.

Comment 7 Lukas Slebodnik 2019-09-02 13:06:40 UTC
Moreover I tried with rawhide container which definitely has right version of systemd and it did not help either

sh-5.0# mkdir temp
sh-5.0# cat >temp/Dockerfile <<EOF
FROM fedora:rawhide

CMD ["/sbin/init"]

STOPSIGNAL SIGRTMIN+3

RUN dnf update -y --best && dnf clean all

#mask systemd-machine-id-commit.service - partial fix for https://bugzilla.redhat.com/show_bug.cgi?id=1472439


RUN systemctl mask systemd-remount-fs.service dev-hugepages.mount sys-fs-fuse-connections.mount systemd-logind.service getty.target console-getty.service systemd-udev-trigger.service systemd-udevd.service systemd-random-seed.service systemd-machine-id-commit.service

RUN dnf -y install procps-ng && dnf clean all
EOF

sh-5.0# podman build -t fedora-init-cgroupsv2 temp/

//snip

sh-5.0# podman run --name test -d fedora-init-cgroupsv2 && sleep 10 && podman exec test systemctl status
0eefd01dfaa8d9cc5b9abe4c46f60dbc7301eb0916e2c65cac074064310763f6
Error: writing file '/sys/fs/cgroup//machine.slice/libpod-0eefd01dfaa8d9cc5b9abe4c46f60dbc7301eb0916e2c65cac074064310763f6.scope/cgroup.procs': Device or resource busy: OCI runtime error

Comment 8 Giuseppe Scrivano 2019-09-02 13:19:13 UTC
opened a PR here: https://github.com/containers/libpod/pull/3922

The error you are seeing is coming from exec.  It is a known issue with joining an existing cgroups v2, and I am still unsure how to fix it correctly.  Basically, we cannot join the initial cgroup path as it will have sub directories, so we will need to join a subdirectory.

Comment 9 Giuseppe Scrivano 2019-09-02 13:42:48 UTC
also opened a PR for crun to address the exec issue: https://github.com/containers/crun/pull/81

Comment 10 Fedora Update System 2019-09-11 21:55:12 UTC
FEDORA-2019-e53d9e7494 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e53d9e7494

Comment 11 Fedora Update System 2019-09-12 14:44:54 UTC
crun-0.9-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e53d9e7494

Comment 12 Fedora Update System 2019-09-13 14:45:46 UTC
FEDORA-2019-f73801f1f2 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-f73801f1f2

Comment 13 Fedora Update System 2019-09-14 01:40:34 UTC
crun-0.9.1-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-f73801f1f2

Comment 14 Fedora Update System 2019-09-19 14:30:14 UTC
crun-0.9.1-1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.