Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 2036145

Summary: systemd-250-2.fc36 doesn't boot in Cloud-Base image compose for rawhide on aarch64, s390x, ppc64el
Product: [Fedora] Fedora Reporter: Kevin Fenzi <kevin>
Component: systemdAssignee: systemd-maint
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: dan, fedoraproject, filbranden, flepied, jeremy.linton, lnykryn, msekleta, pbrobinson, ryncsn, ssahani, s, systemd-maint, yulia.kartseva, yuwatana, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: systemd-250.2-1.fc36 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-10 21:41:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2035608    
Bug Blocks: 245418, 467765    

Description Kevin Fenzi 2021-12-29 23:40:15 UTC
When trying to make the Cloud-Base image for rawhide on aarch64, the compose boots a vm and gets to: 

[   10.284609] Run /init as init process
[   10.332145] systemd[1]: systemd v250-2.fc36 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)                                        
[   10.367033] systemd[1]: Detected virtualization kvm.                                             
[   10.368804] systemd[1]: Detected architecture arm64.                                             
[   10.373847] systemd[1]: Running in initial RAM disk.                                             
[   10.381737] systemd[1]: No hostname configured, using default hostname.                          
[   10.389747] systemd[1]: Hostname set to <fedora>.                                                
[   10.395949] systemd[1]: Initializing machine ID from VM UUID.                                    
[   10.699809] systemd[1]: Failed to link 'restrict_filesystems' LSM BPF program: Cannot allocate memory
[   10.736605] systemd[1]: Failed to allocate manager object: Cannot allocate memory                
[   10.739118] systemd[1]: Freezing execution.

I have untagged systemd-250-2.fc36 and systemd-250-1.fc36 in order to get a compose. 

x86_64 seems fine, it's just aarch64 thats failing this way.

Comment 1 Zbigniew Jędrzejewski-Szmek 2021-12-30 11:32:20 UTC
This is fishy. The code in systemd calls bpf_program__attach_lsm(), which does calloc(1, sizeof(const struct bpf_program)).
This shouldn't fail with ENOMEM. I'll try to reproduce this locally to debug.

Comment 2 Zbigniew Jędrzejewski-Szmek 2021-12-30 21:02:38 UTC
I can reproduce the issue locally on rpi4 with systemd built from git.

While the issue is being debugged, I started a new build with -Dbpf-framework=false on arm64
as a work-around.

Comment 3 Peter Robinson 2021-12-31 13:12:07 UTC
Is there an upstream bug where this is being tracked?

Comment 4 Zbigniew Jędrzejewski-Szmek 2022-01-03 15:35:26 UTC
It turns out that this is easily reproducible using the systemd unit tests:
$ sudo SYSTEMD_LOG_LOCATION=1 build/test-bpf-lsm
...
src/core/cgroup.c:3450: Controller 'bpf-firewall' supported: yes
src/core/cgroup.c:3450: Controller 'bpf-devices' supported: yes
src/core/cgroup.c:3450: Controller 'bpf-foreign' supported: yes
src/core/cgroup.c:3450: Controller 'bpf-socket-bind' supported: yes
src/core/cgroup.c:3450: Controller 'bpf-restrict-network-interfaces' supported: yes
libbpf: Error in bpf_create_map_xattr(cgroup_hash):ERROR: strerror_r(-524)=22(-524). Retrying without BTF.
libbpf: prog 'restrict_filesystems': failed to attach: ERROR: strerror_r(-524)=22
src/core/bpf-lsm.c:199: Failed to link 'restrict_filesystems' LSM BPF program: Cannot allocate memory
src/test/test-bpf-lsm.c:96: Assertion 'manager_new(UNIT_FILE_SYSTEM, MANAGER_TEST_RUN_BASIC, &m) >= 0' failed at src/test/test-bpf-lsm.c:96, function main(). Aborting.
Aborted

"Cannot allocate memory" is a buglet in systemd: it assumes the error is one byte, and
524 & 255 gives 12, i.e. ENOMEM. I'll submit a patch to not do this truncation.

But the real issue is that bpf_program__attach_lsm() returns an error and a bogus errno value.
I don't know the libbpf code at all, but it seems the error is in the kernel:

$ sudo SYSTEMD_LOG_LOCATION=1 strace -y build/test-bpf-lsm
...
bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\374\1\0\0\374\1\0\0\250\3\0\0\0\0\0\0\0\0\0\2"..., btf_log_buf=NULL, btf_size=1468, btf_log_size=0, btf_log_level=0}, 120) = 8<anon_inode:btf>
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=8, value_size=4, max_entries=2048, map_flags=0, inner_map_fd=7<anon_inode:bpf-map>, map_name="cgroup_hash", map_ifindex=0, btf_fd=8<anon_inode:btf>, btf_key_type_id=6, btf_value_type_id=10, btf_vmlinux_value_type_id=0}, 120) = -1 ENOTSUPP (Unknown error 524)
write(2</dev/pts/0>, "libbpf: Error in bpf_create_map_"..., 107libbpf: Error in bpf_create_map_xattr(cgroup_hash):ERROR: strerror_r(-524)=22(-524). Retrying without BTF.
) = 107
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=8, value_size=4, max_entries=2048, map_flags=0, inner_map_fd=7<anon_inode:bpf-map>, map_name="cgroup_hash", map_ifindex=0, btf_fd=0</dev/pts/0>, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 120) = 9<anon_inode:bpf-map>
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_LSM, insn_cnt=61, insns=0x38b9f020, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 15, 11), prog_flags=0, prog_name="restrict_filesy", prog_ifindex=0, expected_attach_type=BPF_LSM_MAC, prog_btf_fd=8<anon_inode:btf>, func_info_rec_size=8, func_info=0x38b9d970, func_info_cnt=1, line_info_rec_size=16, line_info=0x38b96300, line_info_cnt=23, attach_btf_id=27216, attach_prog_fd=0</dev/pts/0>}, 120) = 10<anon_inode:bpf-prog>
brk(0x38bc0000)                         = 0x38bc0000
close(7<anon_inode:bpf-map>)            = 0
bpf(BPF_RAW_TRACEPOINT_OPEN, {raw_tracepoint={name=NULL, prog_fd=10<anon_inode:bpf-prog>}}, 120) = -1 ENOTSUPP (Unknown error 524)
write(2</dev/pts/0>, "libbpf: prog 'restrict_filesyste"..., 82libbpf: prog 'restrict_filesystems': failed to attach: ERROR: strerror_r(-524)=22
) = 82
writev(2</dev/pts/0>, [{iov_base="src/core/bpf-lsm.c:199: ", iov_len=24}, {iov_base="Failed to link 'restrict_filesys"..., iov_len=72}, {iov_base="\n", iov_len=1}], 3src/core/bpf-lsm.c:199: Failed to link 'restrict_filesystems' LSM BPF program: Unknown error 524
) = 97


I'm not sure what we should here. So far we assumed that if the bpf syscall is available and seems
to work, we expect it to work later on and throw an error if it doesn't.

Comment 5 Zbigniew Jędrzejewski-Szmek 2022-01-03 17:01:45 UTC
https://github.com/systemd/systemd/pull/21984 is the cleanup commit for systemd.

Comment 6 Zbigniew Jędrzejewski-Szmek 2022-01-04 08:39:02 UTC
Also on s390x, see https://bugzilla.redhat.com/show_bug.cgi?id=2035608#c5.

Comment 7 Zbigniew Jędrzejewski-Szmek 2022-01-04 11:33:54 UTC
Interestingly, errno 524 is also returned on amd64:

bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=8, value_size=4, max_entries=2048, map_flags=0, inner_map_fd=3<anon_inode:bpf-map>, map_name="cgroup_hash", map_ifindex=0, btf_fd=4<anon_inode:btf>, btf_key_type_id=6, btf_value_type_id=10, btf_vmlinux_value_type_id=0}, 120) = -1 ENOTSUPP (Unknown error 524)
write(2<pipe:[8562394]>, "libbpf: Error in bpf_create_map_"..., 107libbpf: Error in bpf_create_map_xattr(cgroup_hash):ERROR: strerror_r(-524)=22(-524). Retrying without BTF.
) = 107
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH_OF_MAPS, key_size=8, value_size=4, max_entries=2048, map_flags=0, inner_map_fd=3<anon_inode:bpf-map>, map_name="cgroup_hash", map_ifindex=0, btf_fd=0</dev/pts/4>, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 120) = 5<anon_inode:bpf-map>
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=6, insns=0x7ffeb2f682f0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0</dev/pts/4>, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0</dev/pts/4>}, 120) = 6<anon_inode:bpf-prog>
close(6<anon_inode:bpf-prog>)           = 0

Comment 8 Zbigniew Jędrzejewski-Szmek 2022-01-04 16:52:29 UTC
So the status is:
amd64: works
arm64: errno 524 (https://bugzilla.redhat.com/show_bug.cgi?id=2036145#c0)
arm: not supported (libbpf: failed to load BPF skeleton 'restrict_fs_bpf': -3, with 5.15.5-100.fc34.armv7hl+lpae)
        ('3' is ESRCH 3 No such process, but OK.)
s390x: errno 524 (https://bugzilla.redhat.com/show_bug.cgi?id=2035608#c5)
ppc64el: errno 524 (reproduced on ppc64le-test.fedorainfracloud.org with 5.14.9-200.fc34.ppc64le)

I'll disable this also on s390x.

Comment 9 Jeremy Linton 2022-01-05 20:39:38 UTC
Yah, I duplicated it too, the bpf_raw_tracepoint_open() call (sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
) fails on 5.15/5.16 both. Presumably, these unit tests were working in the past? The kernel bpf self tests are a mess of failures on aarch64 with 5.16rc8 I just ran them on with a fedora configured kernel.

BPF has the same problem here that coresight has, in that it is a real pain to figure out what in the kernel is causing the failure since it doesn't tend to print any diagnostic/etc errors when these calls fail.

So the BPF program looks fine, and the SEC("lsm/file_open") point appears to exist (although the mapping isn't 100% clear to me since I tend to use bpftrace and the tracepoint naming is slightly different, but bpf_lsm_file_open seems to be the same on both amd64 and arm64).

So, yah, now I'm looking for who is returning the ENOTSUPP in the kernel (there are a long list of bpf cases that can cause that, and none of them that I see at the moment trivially called via BPF_RAW_TRACEPOINT_OPEN.


If we knew the last working kernel version roughly I could take a stab at bisecting it rather than walking the call path until I find the problem.

Comment 10 Jeremy Linton 2022-01-06 18:36:26 UTC
So, as a FYI, the problem exists as far back as 5.8, so.. I'm a bit confused how this was working, was the systemd on fedora older than the commit enabling this test in 2020?

Comment 11 Jeremy Linton 2022-01-06 21:35:59 UTC
Well, this appears to be caused by the lack of arch_prepare_bpf_trampoline() on anything that isn't x86.

Comment 12 Julia Kartseva 2022-01-07 00:07:52 UTC
(In reply to Jeremy Linton from comment #11)
> Well, this appears to be caused by the lack of arch_prepare_bpf_trampoline()
> on anything that isn't x86.

Yes, that's exactly the issue. Fix to systemd which relaxes BPF LSM set up: https://github.com/systemd/systemd/pull/22025

Comment 13 Zbigniew Jędrzejewski-Szmek 2022-01-07 15:57:26 UTC
IIUC, BPF LSM just cannot work on architectures other than x86/amd64 because of missing kernel support.
With Julia's patch we should handle that gracefully. It'll be included in the build that I'll do later today.
Hopefully support will be added for other arches in future kernels.

There's still the issue of the bogus errno, so I won't close this bug yet.

Comment 14 Zbigniew Jędrzejewski-Szmek 2022-01-10 21:41:12 UTC
Systemd should now handle this gracefully.

I opened a new bug for libbpf and the return value: #2039080