Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1647947
Summary: | dhclient fails with "Can't install packet filter program: Unknown error 524" [ppc64le] | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Menanteau Guy <menantea> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | airlied, awilliam, bskeggs, bugproxy, dan, dcantrell, ewk, hannsj_uhl, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, jpopelka, kernel-maint, labbott, linville, mchehab, mdroth, mjg59, normand, pemensik, pzhukov, steved, thaller, thozza | ||||||
Target Milestone: | --- | Keywords: | Patch | ||||||
Target Release: | --- | ||||||||
Hardware: | ppc64le | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-4.20.0-1.fc30 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-01-03 13:01:50 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1071880 | ||||||||
Attachments: |
|
Description
Menanteau Guy
2018-11-08 15:34:07 UTC
"dhclient:Can't install packet filter program: Unknown error 524" and no IPv4 address is what I got when trying 4.20-pre kernel on my F-28 system. And from what I see in the x86 openqa instance for Rawhide composes, also x86 suffers from this "no IP" problem. adding kernel maintainers to CC, it might be something wrong on the kernel side. still a problem with kernel-4.20.0-0.rc1.git3.1.fc30 strace output from dhclient looks like ... 3756 socket(AF_PACKET, SOCK_RAW, 768) = 7 3756 ioctl(7, SIOCGIFINDEX, {ifr_name="enp0s1", }) = 0 3756 bind(7, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("enp0s1"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0 3756 setsockopt(7, SOL_PACKET, PACKET_AUXDATA, [1], 4) = 0 3756 setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x116fc27f8}, 16) = -1 ENOTSUPP (Unknown error 524) 3756 getpid() = 3756 3756 send(3, "<27>Nov 13 12:13:21 dhclient[375"..., 90, MSG_NOSIGNAL) = 90 3756 write(2, "Can't install packet filter prog"..., 54) = 54 ... Building kernel with CONFIG_BPFILTER enabled to see if it helps. Note that I found the problem by investigating an openqa test fail on AtomicHost iso (in my own openqa environment) but this test is fine on x86-64, this is why I thought at beginning it was a ppc64le specific problem. test on AtomicHost iso ok on x86-64 with Fedora-Rawhide-20181112.n.0 https://openqa.stg.fedoraproject.org/tests/393668 Nothing to do with dhclient in this case. errno 524 (ENOSUPP) is internal to kernel/bpfilter(?) and should not be exposed (see GETSOCKOPT(2)) switch back to ppc64le, seems x86_64 really isn't affected by this Indeed, the official openQA tests on ppc64le do seem to be suffering from this, same tests on other arches are not. I just spent an hour rediscovering this, I should've looked for bug reports from Guy first :P I get the same error if I use rtl8139 as the network device rather than virtio-net, if it helps at all. This looks related to capabilities. I had a system at hand (custom kernel "4.20.0-rc1.skt", ppc64le), where NetworkManager's dhclient would fail with strace output: setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) Interestingly, when starting dhclient in a terminal, it would succeed. So, I removed CapabilityBoundingSet=CAP_NET_ADMIN CAP_DAC_OVERRIDE CAP_NET_RAW CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_MODULE CAP_AUDIT_WRITE CAP_KILL CAP_SYS_CHROOT from /usr/lib/systemd/system/NetworkManager.service, and then dhclient started working with NetworkManager. adding CAP_SYS_ADMIN to CapabilityBoundingSet made it work. I see the problem even when running dhclient from the command line with "sudo dhclient enp0s1" (in a terminal app under XFCE). And still problem in NM with CAP_SYS_ADMIN added. Could it be 2 distinct issues, with one ppc64/ppc64le specific? Thomas says the system he's testing on is ppc64le. (In reply to Adam Williamson from comment #15) > Thomas says the system he's testing on is ppc64le. right, I missed that :-) What is next step for this bug ? * there was in comment#12 a proposal to add CAP_SYS_ADMIN to CapabilityBoundingSet in /usr/lib/systemd/system/NetworkManager.service * is it only a workaround or a proposed correction ? Michel, does adding CAP_SYS_ADMIN fix the problem for you? Because it didn't for me. Created attachment 1511450 [details] bug1647947_still_failed_despite_workaround.png as per attached image bug1647947_still_failed_despite_workaround.png I tried the workaround of comment#12 modifying the NetworkManager.service file in an openQA test with last Rawhide compose (20181204) But despite service reload and restart * we still have error 524 at install packet filter (the red text in png file) * and no assigned ip address. Created attachment 1511451 [details] bug1647947_still_failed_despite_workaround.png my previous image was not complete, so replace by this new one. Comment on attachment 1511450 [details] bug1647947_still_failed_despite_workaround.png * keep first png to show sed command for workaround in NetworkManager.service * and 2nd png to show ip a command output. Did you do systemctl daemon-reload (IIRC) after modifying the service file? Just modifying the service file and restarting the service won't do the trick. I can actually probably hack up a test which uses a modified NetworkManager package both during and after install, and see what happens with that... yes I did the daemon-reload as detailed in my local patch https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/424a1787038557f134ebf3f899c688a39324adde?branch=debug_1647947 There are a couple recently-proposed patches, specific to ppc64, which I think may address this issue: https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182399.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182454.html I guess that's it, dhclient succeeds after manually setting bpf_jit_limit to a pozitive number. Laura, Justin, could we maybe put those in Rawhide and see if the openQA tests start working again? thanks! ------- Comment From hannsj_uhl.com 2018-12-07 07:21 EDT------- Comment from Sandipan Das 2018-12-07 06:10:48 CST A workaround would be to add something like the following in /etc/sysctl.conf. This way it will persist across reboots and nothing else has to be modified. net.core.bpf_jit_limit = 262144000 Yes, but it needs a successful installation first. AFAIK it's not possible to pass the setting thru the kernel command line. could probably set it with sysctl from a shell in anaconda. I could try and hack the openQA tests to do that as a check... (In reply to Adam Williamson from comment #29) > could probably set it with sysctl from a shell in anaconda. I could try and > hack the openQA tests to do that as a check... I tried a patch (1) for some openQA tests and confirmed a sysctl allow bypass for some install flow, not all of them. (1) https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/050466890c332a46285341daba6625367a68c314?branch=bug1647947_workaround yeah, ones where the network needs to be working before you can get to a console won't be fixed, obviously. but if it works for at least some of the tests, it gives us a solid indication that is the problem. oddly enough, I've noticed the network sometimes not being up on *x86_64* tests recently too (far less often than on ppc64, though). not sure if this is something somehow similar, or entirely unrelated. With the patch from https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I see dhclient is working again. It seems like there's been a lot of discussion so I'd like to wait until a patch hits a maintainer's tree. Once it get committed we can certainly bring it to Fedora. ------- Comment From hannsj_uhl.com 2018-12-17 03:38 EDT------- (In reply to comment #15) > With the patch from > https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I > see dhclient is working again. > . ... which is upstream accepted in the bpf tree as git commit https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") And now also in the mainline tree as https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3 (post-rc7) |