Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1719057 - Installer boot fails if any option requiring network access during initramfs phase is used
Summary: Installer boot fails if any option requiring network access during initramfs ...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: rawhide
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks: F31BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2019-06-10 23:26 UTC by Adam Williamson
Modified: 2019-07-03 15:22 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-03 15:22:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2019-06-10 23:26:50 UTC
In openQA testing of yesterday's Rawhide compose, all the kickstart tests failed. So did the tests that use an updates image that is hosted on a network server. In each case, the test failed because the system failed to boot to the installer, instead booting to the dracut rescue prompt.

I think what's going on here is any scenario which requires the network to be brought up during the initramfs phase - which includes the use of a kickstart or updates image retrieved over the network - causes the boot to fail.

Here are the failed tests:

https://openqa.fedoraproject.org/tests/410060
https://openqa.fedoraproject.org/tests/410063
https://openqa.fedoraproject.org/tests/410048
https://openqa.fedoraproject.org/tests/410008
https://openqa.fedoraproject.org/tests/410265
https://openqa.fedoraproject.org/tests/410264

I'm blaming this on NetworkManager because the tests passed on the previous compose (20190604.n.0) and neither anaconda nor dbus nor any other obvious suspect changed in the 0609.n.1 compose. NetworkManager *did* change, and the changelog looks a bit suspicious for this bug:

  * Tue Jun 04 2019 Lubomir Rintel <lkundrak> - 1:1.20.0-0.2
  - Update the 1.20.0 snapshot
  - Re-enable the initrd generator

Those sure look like relevant changes to me.

This is pretty easy to reproduce: just download an installer image from the 20190609.n.1 compose - e.g. https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20190609.n.1/compose/Server/x86_64/iso/Fedora-Server-dvd-x86_64-Rawhide-20190609.n.1.iso - boot it, and add a kickstart or updates.img from a network server to the boot options. e.g. add 'inst.ks=http://fedorapeople.org/groups/qa/kickstarts/firewall-configured-net.ks' . That should be enough to trigger the bug.

Proposing as a Beta blocker as a violation of "The installer must be able to use all available kickstart delivery methods" - https://fedoraproject.org/wiki/Fedora_30_Beta_Release_Criteria#Kickstart_delivery .

Comment 1 Adam Williamson 2019-06-11 21:16:28 UTC
Looking at the journal from the rescue shell, there seems to be a cycle of NetworkManager starting up, running into three dbus errors because dbus is not running (I'm not sure whether that's expected or not in the initramfs environment), exiting with the network device in state 'disconnected', then restarting and going through the whole cycle again. It does this hundreds of times. The end of the process looks like this:

device (ens3): carrier: link connected
manager: (ens3): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
device (ens3): state change: unmanaged -> unavailable
sleep-monitor-sd: failed to acquire D-Bus proxy: Could not connect: No such file or directory
firewall: could not connect to system D-Bus (Could not connect: No such file or directory)
ifcfg-rh: dbus: couldn't initialize system bus: Could not connect: No such file or directory
device (ens3): state change: unavailable -> disconnected
manager: startup complete
quitting now that startup is complete
exiting (success)

Then a half second later it starts up again:

NetworkManager (version 1.20.0-0.2.fc31) is starting... (after a restart)

and goes through the same process.

Comment 2 Lubomir Rintel 2019-06-12 13:05:44 UTC
Thanks for the report. The fix for dracut is here: https://github.com/dracutdevs/dracut/pull/578

If the dracut maintainers will be willing to review and apply the patch I'd prefer if we didn't revert the change in NetworkManager.

Comment 3 Adam Williamson 2019-06-12 15:10:12 UTC
It just so happens I'm a proven packager. Soo...;)

Comment 4 Adam Williamson 2019-06-12 15:23:58 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=35504593

Let's see how the next compose goes.

Comment 5 Lubomir Rintel 2019-06-14 07:13:46 UTC
(In reply to Adam Williamson from comment #3)
> It just so happens I'm a proven packager. Soo...;)

Ah, okay, me too, but I thought this sort of thing should get an upstream ack.
Guess this is all right, thanks for doing that.

Comment 6 Adam Williamson 2019-06-14 15:53:20 UTC
eh, if upstream doesn't like it he can take it out again. :P I like composes that work!

Unfortunately we're not getting any composes at all ATM, I think partly because of the libgit2 module drama, so don't know if this is fixed yet.

Comment 7 Dan Horák 2019-07-01 15:57:42 UTC
I suspect bug #1725872 might be another variant of this one ...

Comment 8 Adam Williamson 2019-07-03 15:22:29 UTC
This one was actually fixed by the change I made back on June 12, thanks for the reminder to close it :)


Note You need to log in before you can comment on or make changes to this bug.