Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1095891 - systemd-212-4 causes Live images to hang on boot in 1 CPU guests
Summary: systemd-212-4 causes Live images to hang on boot in 1 CPU guests
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1096386 1097606 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-05-08 18:52 UTC by Josh Boyer
Modified: 2014-05-28 19:47 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-05-28 19:47:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
systemd.log_level=debug output of hang (deleted)
2014-05-08 18:59 UTC, Josh Boyer
no flags Details
systemd.log_level=debug output of boot when CPUs=2 (deleted)
2014-05-08 19:00 UTC, Josh Boyer
no flags Details
screenshot (deleted)
2014-05-28 06:07 UTC, Kay Sievers
no flags Details


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 79283 0 None None None Never

Description Josh Boyer 2014-05-08 18:52:06 UTC
Description of problem:

Downloading today's live images from today (20140508) and attempting to boot them in a KVM guest with 1 CPU will hang at the Basic system target.  If I boot the images from 20140507, they work and those are composed with systemd-212-2.

Version-Release number of selected component (if applicable):

systemd-214-4

How reproducible:

Always (tested Workstation and XFCE live images)


Steps to Reproduce:
1. Download live image
2. Create KVM guest with 1 CPU
3. Boot

Actual results:

Boot hangs at "Reached target Basic System."

Expected results:

Boots to the live desktop

Additional info:

I tried modifying the amount of memory allocated first, but that didn't seem to make a difference.  The images clearly work if the KVM guest has 2 CPUs, but not if they have one.

I noticed there was a rather large patch to udev in this systemd release.  It's possible that broke booting in this scenario.

Comment 1 Josh Boyer 2014-05-08 18:59:46 UTC
Created attachment 893731 [details]
systemd.log_level=debug output of hang

Comment 2 Josh Boyer 2014-05-08 19:00:44 UTC
Created attachment 893732 [details]
systemd.log_level=debug output of boot when CPUs=2

Comment 3 Josh Boyer 2014-05-08 19:01:51 UTC
The attachments above are from the exact same VM instance, both using the XFCE 20140508 live image ISO.  The only difference between them is that the hang situation has 1 CPU in the VM and the working boot has 2 CPUs in the VM.

Comment 4 Zing 2014-05-09 17:47:43 UTC
Seeing this also in my rawhide qemu-kvm install.

Single cpu:

systemd-212-4.fc21.x86_64 +
kernel-3.15.0-0.rc4.git1.1.fc21.x86_64 - boots
kernel-3.15.0-0.rc4.git2.1.fc21.x86_64 - hangs at Reached Basic System
kernel-3.15.0-0.rc4.git3.1.fc21.x86_64 - hangs at Reached Basic System

Two cpu:

systemd-212-4.fc21.x86_64 +
kernel-3.15.0-0.rc4.git1.1.fc21.x86_64 - boots
kernel-3.15.0-0.rc4.git2.1.fc21.x86_64 - boots
kernel-3.15.0-0.rc4.git3.1.fc21.x86_64 - boots

Comment 5 Bill Gianopoulos 2014-05-10 14:11:21 UTC
*** Bug 1096386 has been marked as a duplicate of this bug. ***

Comment 6 Bill Gianopoulos 2014-05-11 17:02:24 UTC
(In reply to Josh Boyer from comment #3)
> The attachments above are from the exact same VM instance, both using the
> XFCE 20140508 live image ISO.  The only difference between them is that the
> hang situation has 1 CPU in the VM and the working boot has 2 CPUs in the VM.

I am curious as to what the last systemd version this worked with was.  Reason I ask is that between 212-2, which I am assuming works and 212-4 there were 2 changes both in separate builds.  So, does this work with 212-3?  It seems to me the uuidd change is more likely to be causing my issue as it is more related to the way I am booting.

Comment 7 Bill Gianopoulos 2014-05-11 17:57:54 UTC
(In reply to Bill Gianopoulos from comment #6)
> (In reply to Josh Boyer from comment #3)
> > The attachments above are from the exact same VM instance, both using the
> > XFCE 20140508 live image ISO.  The only difference between them is that the
> > hang situation has 1 CPU in the VM and the working boot has 2 CPUs in the VM.
> 
> I am curious as to what the last systemd version this worked with was. 
> Reason I ask is that between 212-2, which I am assuming works and 212-4
> there were 2 changes both in separate builds.  So, does this work with
> 212-3?  It seems to me the uuidd change is more likely to be causing my
> issue as it is more related to the way I am booting.

The reason I say this is that I ma not doing a network boot but am doing a boot using UUID's.  So just think
ing perhaps the UUID patch is more relevant to my issue.

Comment 8 Adam Williamson 2014-05-13 00:04:14 UTC
Confirming this with F21 virt host here and a live image composed from today's Rawhide: consistently fails to boot with a guest with a single CPU. Haven't checked with a bare metal system yet.

Comment 9 David Shea 2014-05-14 15:53:35 UTC
*** Bug 1097606 has been marked as a duplicate of this bug. ***

Comment 10 Bill Gianopoulos 2014-05-16 15:20:02 UTC
What is the status of this issue?  This is kind of important to fix, as it is not possible to get new users running rawhide on a single CPU system.

Also, it is keeping people like me from testing the latest kernel because I am stuck on version 3.15.0-0.rc4.git1.1.  If i try to update the kernel, the resultant kernel will not boot.  I can, at least, test other packages.

I tried downgrading systemd to see if that would help, but I can't get that to work using yum because of cyclic dependency issue.

Comment 11 Bill Gianopoulos 2014-05-18 12:28:18 UTC
OK I figured out my dependency issue and downgraded systemd to 212-3 and then re-installed the 3.15.0-0.rc5.git2.9 kernel and that results in a successful boot.  Therefore this issue is definitely a result of the change between systemd 212-3 and 212-4 which, according to the changelog, is:

* Wed May 07 2014 Kay Sievers <kay> - 212-4 - add netns udev workaround

Comment 12 Bill Gianopoulos 2014-05-18 12:54:21 UTC
Just to be excruciatingly clear here.  The purpose of re-installing the kernel was to force a rebuild of initramfs with the downgraded version of systemd.

Comment 13 Bill Gianopoulos 2014-05-24 01:43:24 UTC
OK this has gone on long enough.  A patch that the description defines as a workaround so not a proper fix for anything is preventing single CPU systems from being able to boot.  This "fix" need to be reverted ASAP.

Comment 14 Adam Williamson 2014-05-24 02:27:14 UTC
lennart and kay are travelling ATM, and harald's been off work lately, that's why systemd/udev stuff is taking longer than usual. the rest of us are usually reluctant to touch those bits unless we're really sure what we're doing, but i might try a systemd build with the changes from 3 to 4 reverted later.

Comment 15 Adam Williamson 2014-05-24 02:32:20 UTC
the change between 3 and 4 has a rather different description upstream, btw:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=9ea28c55a2488e6cd4a44ac5786f12b71ad5bc9f

"udev: remove seqnum API and all assumptions about seqnums"

Comment 16 Bill Gianopoulos 2014-05-24 10:15:20 UTC
(In reply to Adam Williamson from comment #14)
> lennart and kay are travelling ATM, and harald's been off work lately,
> that's why systemd/udev stuff is taking longer than usual. the rest of us
> are usually reluctant to touch those bits unless we're really sure what
> we're doing, but i might try a systemd build with the changes from 3 to 4
> reverted later.

Sorry, sometimes I get a bit impatient.  In order to fix a different issue I wanted to do a new clean install and just there has been no way to do that for awhile now.

Comment 17 Adam Williamson 2014-05-27 00:02:42 UTC
OK, so I hit a small road bump reproducing this for testing purposes - it doesn't seem to happen at least for me with non-debug kernels. But it looks like it's reliably reproducible with debug kernels.

Today's (2014-05-26) Xfce nightly reliably reproduces this issue in a single-CPU KVM guest for me: six boot attempts, six fails. I built an Xfce live locally with the same kernel (3.15.0-0.rc6.git1.1.fc21.x86_64) but with a systemd scratch build with the patch from -4 dropped. Tried five boots on the same KVM, got five successes. Seems pretty definitive.

I don't know what that patch is intended to fix, why it was considered sufficiently important to be backported, but I can't imagine that it could be something *worse* than this, so I'm going to go ahead and push out a systemd -5 with the patch reverted. Thanks, Bill, for identifying the offending component.

Comment 18 Adam Williamson 2014-05-27 00:10:45 UTC
as this seems like a very serious issue, I've reported it directly to upstream as https://bugs.freedesktop.org/show_bug.cgi?id=79283 just to be safe (though I'm sure Kay would look after it upstream in any case).

Comment 19 Kay Sievers 2014-05-27 00:40:14 UTC
Without this patch, the installer will hang or not work, because the way
network namespaces are implemented in the kernel, they break udev by "stealing"
expected seqnums from the host's primary namespace.

The base OS recently started to use network namespaces, PrivateNetwork=yes
in unbit files, so this will show up again.

Is there a simple way to reproduce the "single CPU" issue? It sounds pretty
strange.

Are we sure that is hangs for forever, not only for a few minutes and the
continues?

Could you try to boot with plymouth disabled on the kernel command
line?

(Lennart and I are still on vacation this week, without proper internet.)

Comment 20 Adam Williamson 2014-05-27 01:15:11 UTC
"Is there a simple way to reproduce the "single CPU" issue? It sounds pretty
strange."

Very simple. Grab the nightly I linked above. Set up a normal KVM (I'm using virt-manager) with a single CPU. Try and boot it. Profit. Add a CPU, it'll boot fine. Use systemd 212-3 or 212-5, it'll boot fine.

"Are we sure that is hangs for forever, not only for a few minutes and the
continues?"

I didn't leave mine for terribly long, don't know about the other reporters. I can leave one sitting here while I make dinner.

"Could you try to boot with plymouth disabled on the kernel command
line?"

The hang is before plymouth kicks in, I think, but sure, easy enough to try...

...boot without 'rhgb quiet' and with 'rd.plymouth=0 plymouth.enable=0' still hangs. I'll leave this attempt sitting here for a while.

Comment 21 Kay Sievers 2014-05-27 10:59:15 UTC
(In reply to Adam Williamson from comment #20)
> "Is there a simple way to reproduce the "single CPU" issue? It sounds pretty
> strange."
> 
> Very simple. Grab the nightly I linked above.

Care to add an exact link here, I don't see it. We are in China, downloading large files might not work too well, so it would be nice to get the
right one. :)

Comment 22 Harald Hoyer 2014-05-27 11:48:57 UTC
my guess is, that it needs also
http://cgit.freedesktop.org/systemd/systemd/commit/?id=83be2c398589a3d64db5999cfd5527c5219bff46

which fixes the "udevadm settle" issue introduced by 9ea28c55a2488e6cd4a44ac5786f12b71ad5bc9f

Comment 23 Gene Czarcinski 2014-05-27 14:53:17 UTC
hanging forever ... well, a couple of hours is close enough

Comment 24 Adam Williamson 2014-05-27 14:59:38 UTC
kay: sorry, the link was in the other bug report: http://kojipkgs.fedoraproject.org/work/tasks/1167/6891167/Fedora-Live-Xfce-x86_64-rawhide-20140526.iso

Comment 25 Kay Sievers 2014-05-28 06:07:27 UTC
Created attachment 899832 [details]
screenshot

Adding "debug" to the kernel commandline shows the output like in the
attached screenshot. Dracut hangs in a loop. It looks like the issue
Harald pointed out above.

I really have no idea what kind of woodoo is going on, that makes it
behave differently with one or more CPUs.

A new release is coming out today, and should have the fix.

Comment 26 Kay Sievers 2014-05-28 11:00:36 UTC
Submitted to rawhide.

Comment 27 Adam Williamson 2014-05-28 19:47:58 UTC
I tested a live image with systemd 213-3 and kernel 3.15.0-0.rc7.git1.1.fc21.x86_64 (a debug kernel). Booted successfully three times in a row, install seemed fine up until it hit https://bugzilla.redhat.com/show_bug.cgi?id=1101557 , which looks to have been going on for a while. So I'd say this is looking fixed.


Note You need to log in before you can comment on or make changes to this bug.