Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1647947 - dhclient fails with "Can't install packet filter program: Unknown error 524" [ppc64le]
Summary: dhclient fails with "Can't install packet filter program: Unknown error 524" ...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2018-11-08 15:34 UTC by Menanteau Guy
Modified: 2019-01-03 13:01 UTC (History)
29 users (show)

Fixed In Version: kernel-4.20.0-1.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-03 13:01:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
bug1647947_still_failed_despite_workaround.png (116.78 KB, image/png)
2018-12-04 18:03 UTC, Michel Normand
no flags Details
bug1647947_still_failed_despite_workaround.png (114.33 KB, image/png)
2018-12-04 18:12 UTC, Michel Normand
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 173962 0 None None None 2019-04-18 16:05:44 UTC

Description Menanteau Guy 2018-11-08 15:34:07 UTC
I don't get an ip address thru dhcp when I run a qemu to install an AtomicHost ppc64le iso image.

Fedora-AtomicHost-ostree-ppc64le-Rawhide-20181105.n.1.iso

qemu command:
/usr/bin/qemu-system-ppc64 -name vm90 -enable-kvm -M pseries -smp 1 -m 8G -nographic -nodefaults -monitor stdio -serial pty -device
virtio-net-pci,netdev=net10130,mac=c0:ff:ee:00:00:90 -netdev bridge,br=br0,id=net10130 -cdrom isolerawhide_atomic -drive file=hd1.qcow2 -drive file=hd2.qcow2 -boot d -S

Note that in my env it should connect to a dhcp and get an ip address based on the given mac.

When reach the anaconda panel to choose between starting vnc or text mode for installation:

Starting installer, one moment...
anaconda 30.8-1.fc30 for Fedora Rawhide (pre-release) started.
 * installation log files are stored in /tmp during the installation
 * shell is available on TTY2
 * when reporting a bug add logs from /tmp as separate text/plain attachments
15:29:08 X startup failed, falling back to text mode
================================================================================
================================================================================

1) Start VNC
2) Use text mode

Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]: 

if a choose VNC, it didn't get a valid ip address

15:29:56 Starting VNC...
15:30:02 The VNC server is now running.
15:30:02 

WARNING!!! VNC server running with NO PASSWORD!
You can use the vncpassword=PASSWORD boot option
if you would like to secure the server.

15:30:02 Please manually connect your vnc client to IP-ADDRESS:1 to begin the install. Switch to the shell (Ctrl-B 2) and run 'ip addr' to find the IP-ADDRESS.
15:30:02 Attempting to start vncconfig

I can check that there is no ip address:
[anaconda root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul
t qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether c0:ff:ee:00:00:90 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::58bd:d42a:2b9a:3878/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever


In the syslog, I can find:
...
15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9460] bus-manager: (dhcp) accepted connection 0x1002b18e910 on private socket
15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9466] dhcp4 (enp0s0): unmapped DHCP state 'PREINIT'
15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9468] dhcp4 (enp0s0): DHCP state 'unknown' -> 'unknown' (reason: 'PREINIT')
15:30:24,948 DEBUG NetworkManager:<debug> [1541691024.9481] bus-manager: (dhcp) closed connection 0x1002b18e910 on private socket
15:30:24,949 ERR dhclient:Can't install packet filter program: Unknown error 524
15:30:24,950 ERR dhclient:or 524
15:30:24,950 ERR dhclient:This version of ISC DHCP is based on the release available
15:30:24,950 ERR dhclient:on ftp.isc.org. Features have been added and other changes
15:30:24,950 ERR dhclient:have been made to the base software release in order to make
15:30:24,950 ERR dhclient:it work better with this distribution.
15:30:24,950 ERR dhclient:ution.
15:30:24,951 ERR dhclient:Please report issues with this software via:
15:30:24,951 ERR dhclient:https://bugzilla.redhat.com/
15:30:24,951 ERR dhclient:ution. 
15:30:24,951 ERR dhclient:exiting.
15:30:24,953 INFO NetworkManager:<info>  [1541691024.9535] dhcp4 (enp0s0): client pid 2820 exited with status 1
15:30:24,953 INFO NetworkManager:<info>  [1541691024.9536] dhcp4 (enp0s0): state changed unknown -> done
15:30:24,953 DEBUG NetworkManager:<debug> [1541691024.9539] device[0x1002b1f45b0] (enp0s0): new DHCPv4 client state 3
15:30:24,954 DEBUG NetworkManager:<debug> [1541691024.9540] device[0x1002b1f45b0] (enp0s0): DHCPv4 failed (ip_state conf)
15:30:24,954 DEBUG NetworkManager:<debug> [1541691024.9542] device[0x1002b1f45b0] (enp0s0): remove_pending_action (1): 'dhcp4'
15:30:24,954 INFO NetworkManager:<info>  [1541691024.9545] dhcp4 (enp0s0): canceled DHCP transaction

Comment 1 Dan Horák 2018-11-08 15:42:31 UTC
"dhclient:Can't install packet filter program: Unknown error 524" and no IPv4 address is what I got when trying 4.20-pre kernel on my F-28 system.

Comment 2 Dan Horák 2018-11-08 16:42:48 UTC
And from what I see in the x86 openqa instance for Rawhide composes, also x86 suffers from this "no IP" problem.

Comment 3 Dan Horák 2018-11-09 11:35:32 UTC
adding kernel maintainers to CC, it might be something wrong on the kernel side.

Comment 4 Dan Horák 2018-11-09 11:45:25 UTC
still a problem with kernel-4.20.0-0.rc1.git3.1.fc30

Comment 5 Dan Horák 2018-11-13 12:42:43 UTC
strace output from dhclient looks like

...
3756  socket(AF_PACKET, SOCK_RAW, 768)  = 7
3756  ioctl(7, SIOCGIFINDEX, {ifr_name="enp0s1", }) = 0
3756  bind(7, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("enp0s1"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
3756  setsockopt(7, SOL_PACKET, PACKET_AUXDATA, [1], 4) = 0
3756  setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x116fc27f8}, 16) = -1 ENOTSUPP (Unknown error 524)
3756  getpid()                          = 3756
3756  send(3, "<27>Nov 13 12:13:21 dhclient[375"..., 90, MSG_NOSIGNAL) = 90
3756  write(2, "Can't install packet filter prog"..., 54) = 54
...

Building kernel with CONFIG_BPFILTER enabled to see if it helps.

Comment 6 Menanteau Guy 2018-11-13 15:18:48 UTC
Note that I found the problem by investigating an openqa test fail on AtomicHost iso (in my own openqa environment) but this test is fine on x86-64, this is why I thought at beginning it was a ppc64le specific problem.
test on AtomicHost iso ok on x86-64 with Fedora-Rawhide-20181112.n.0 https://openqa.stg.fedoraproject.org/tests/393668

Comment 7 Pavel Zhukov 2018-11-14 08:21:59 UTC
Nothing to do with dhclient in this case. 
errno 524 (ENOSUPP) is internal to kernel/bpfilter(?) and should not be exposed (see GETSOCKOPT(2))

Comment 8 Dan Horák 2018-11-15 14:06:45 UTC
switch back to ppc64le, seems x86_64 really isn't affected by this

Comment 9 Adam Williamson 2018-11-15 22:27:23 UTC
Indeed, the official openQA tests on ppc64le do seem to be suffering from this, same tests on other arches are not. I just spent an hour rediscovering this, I should've looked for bug reports from Guy first :P

Comment 10 Adam Williamson 2018-11-16 00:56:33 UTC
I get the same error if I use rtl8139 as the network device rather than virtio-net, if it helps at all.

Comment 11 Thomas Haller 2018-11-19 16:23:12 UTC
This looks related to capabilities.


I had a system at hand (custom kernel "4.20.0-rc1.skt", ppc64le), where NetworkManager's dhclient would fail with strace output:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524)

Interestingly, when starting dhclient in a terminal, it would succeed. So, I removed

  CapabilityBoundingSet=CAP_NET_ADMIN CAP_DAC_OVERRIDE CAP_NET_RAW CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_MODULE CAP_AUDIT_WRITE CAP_KILL CAP_SYS_CHROOT

from /usr/lib/systemd/system/NetworkManager.service, and then dhclient started working with NetworkManager.

Comment 12 Thomas Haller 2018-11-19 16:26:25 UTC
adding CAP_SYS_ADMIN to CapabilityBoundingSet made it work.

Comment 13 Dan Horák 2018-11-20 10:33:32 UTC
I see the problem even when running dhclient from the command line with "sudo dhclient enp0s1" (in a terminal app under XFCE).

Comment 14 Dan Horák 2018-11-20 13:07:44 UTC
And still problem in NM with CAP_SYS_ADMIN added. Could it be 2 distinct issues, with one ppc64/ppc64le specific?

Comment 15 Adam Williamson 2018-11-20 16:27:03 UTC
Thomas says the system he's testing on is ppc64le.

Comment 16 Dan Horák 2018-11-20 16:43:09 UTC
(In reply to Adam Williamson from comment #15)
> Thomas says the system he's testing on is ppc64le.

right, I missed that :-)

Comment 17 Michel Normand 2018-12-04 17:26:17 UTC
What is next step for this bug ?

* there was in comment#12 a proposal to add CAP_SYS_ADMIN to CapabilityBoundingSet in /usr/lib/systemd/system/NetworkManager.service

* is it only a workaround or a proposed correction ?

Comment 18 Dan Horák 2018-12-04 17:42:44 UTC
Michel, does adding CAP_SYS_ADMIN fix the problem for you? Because it didn't for me.

Comment 19 Michel Normand 2018-12-04 18:03:45 UTC
Created attachment 1511450 [details]
bug1647947_still_failed_despite_workaround.png

as per attached image bug1647947_still_failed_despite_workaround.png I tried the workaround of comment#12 modifying the NetworkManager.service file in an openQA test with last Rawhide compose (20181204) 
But despite service reload and restart 
* we still have error 524 at install packet filter (the red text in png file)
* and no assigned ip address.

Comment 20 Michel Normand 2018-12-04 18:12:00 UTC
Created attachment 1511451 [details]
bug1647947_still_failed_despite_workaround.png

my previous image was not complete, so replace by this new one.

Comment 21 Michel Normand 2018-12-04 18:16:52 UTC
Comment on attachment 1511450 [details]
bug1647947_still_failed_despite_workaround.png

* keep first png to show sed command for workaround in NetworkManager.service
* and  2nd   png to show ip a command output.

Comment 22 Adam Williamson 2018-12-04 18:35:20 UTC
Did you do systemctl daemon-reload (IIRC) after modifying the service file? Just modifying the service file and restarting the service won't do the trick.

I can actually probably hack up a test which uses a modified NetworkManager package both during and after install, and see what happens with that...

Comment 23 Michel Normand 2018-12-04 19:02:31 UTC
yes I did the daemon-reload as detailed in my local patch https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/424a1787038557f134ebf3f899c688a39324adde?branch=debug_1647947

Comment 24 Michael Roth 2018-12-06 17:21:25 UTC
There are a couple recently-proposed patches, specific to ppc64, which I think may address this issue:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182399.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182454.html

Comment 25 Dan Horák 2018-12-06 17:42:56 UTC
I guess that's it, dhclient succeeds after manually setting bpf_jit_limit to a pozitive number.

Comment 26 Adam Williamson 2018-12-06 19:43:49 UTC
Laura, Justin, could we maybe put those in Rawhide and see if the openQA tests start working again? thanks!

Comment 27 IBM Bug Proxy 2018-12-07 12:30:53 UTC
------- Comment From hannsj_uhl.com 2018-12-07 07:21 EDT-------
Comment from  Sandipan Das 2018-12-07 06:10:48 CST

A workaround would be to add something like the following in /etc/sysctl.conf. This way it will persist across reboots and nothing else has to be modified.

net.core.bpf_jit_limit = 262144000

Comment 28 Dan Horák 2018-12-07 12:41:01 UTC
Yes, but it needs a successful installation first. AFAIK it's not possible to pass the setting thru the kernel command line.

Comment 29 Adam Williamson 2018-12-07 18:20:44 UTC
could probably set it with sysctl from a shell in anaconda. I could try and hack the openQA tests to do that as a check...

Comment 30 Michel Normand 2018-12-11 16:01:36 UTC
(In reply to Adam Williamson from comment #29)
> could probably set it with sysctl from a shell in anaconda. I could try and
> hack the openQA tests to do that as a check...

I tried a patch (1) for some openQA tests and confirmed a sysctl allow bypass for some install flow, not all of them.

(1) https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/050466890c332a46285341daba6625367a68c314?branch=bug1647947_workaround

Comment 31 Adam Williamson 2018-12-11 16:57:44 UTC
yeah, ones where the network needs to be working before you can get to a console won't be fixed, obviously. but if it works for at least some of the tests, it gives us a solid indication that is the problem.

oddly enough, I've noticed the network sometimes not being up on *x86_64* tests recently too (far less often than on ppc64, though). not sure if this is something somehow similar, or entirely unrelated.

Comment 32 Dan Horák 2018-12-11 17:45:24 UTC
With the patch from https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I see dhclient is working again.

Comment 33 Laura Abbott 2018-12-11 17:55:21 UTC
It seems like there's been a lot of discussion so I'd like to wait until a patch hits a maintainer's tree. Once it get committed we can certainly bring it to Fedora.

Comment 34 IBM Bug Proxy 2018-12-17 08:40:28 UTC
------- Comment From hannsj_uhl.com 2018-12-17 03:38 EDT-------
(In reply to comment #15)
> With the patch from
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I
> see dhclient is working again.
>
.
... which is upstream accepted in the bpf tree as git commit
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3
("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")

Comment 35 Dan Horák 2018-12-23 10:00:58 UTC
And now also in the mainline tree as https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3 (post-rc7)


Note You need to log in before you can comment on or make changes to this bug.