Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1214441 - btrfs raid1 fails to boot after successful installation from live image
Summary: btrfs raid1 fails to boot after successful installation from live image
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: spin-kickstarts
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeroen van Meeuwen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker https://fedoraproject...
: 1246906 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-22 17:44 UTC by James Patterson
Modified: 2016-03-23 21:13 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-05 19:45:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log/anaconda from kvm (145.19 KB, application/zip)
2015-04-27 07:29 UTC, James Patterson
no flags Details
screenshot, grub edit (23.78 KB, image/png)
2015-04-27 19:26 UTC, Chris Murphy
no flags Details
screenshot, grub can't find command (14.34 KB, image/png)
2015-04-27 19:27 UTC, Chris Murphy
no flags Details
anaconda-ks-cfg reproducing c14 (1.65 KB, text/plain)
2015-04-27 19:42 UTC, Chris Murphy
no flags Details
anaconda.log, c14 (92.27 KB, text/plain)
2015-04-27 19:43 UTC, Chris Murphy
no flags Details
program.log, c14 (42.04 KB, text/plain)
2015-04-27 19:44 UTC, Chris Murphy
no flags Details
storage.log, c14 (1.10 MB, text/plain)
2015-04-27 19:44 UTC, Chris Murphy
no flags Details
grub.cfg, c14 (3.96 KB, text/plain)
2015-04-27 19:51 UTC, Chris Murphy
no flags Details
debug grub2-mkconfig, without initramfs, c26 (59.32 KB, text/plain)
2015-04-28 03:41 UTC, Chris Murphy
no flags Details
debug grub2-mkconfig, with initramfs, c26 (55.70 KB, text/plain)
2015-04-28 03:42 UTC, Chris Murphy
no flags Details

Description James Patterson 2015-04-22 17:44:02 UTC
Description of problem:
btrfs, 2 disks, raid 1, luks.

Complains about unknown command /dev/luks-something on first boot.
Then systemd hangs forever with "A start job for dev-mapper-luks-something".

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 James Patterson 2015-04-22 18:05:26 UTC
The luks16 line in grub contains a newline. Removing it stops the error about invalid command, but systemd still hangs "Running start job for device dev-mapper luks-blah"

Comment 2 James Patterson 2015-04-22 18:13:52 UTC
Workaround: in the installer uncheck the box to encrypt the swap partition.

Comment 3 Fedora Blocker Bugs Application 2015-04-23 19:51:45 UTC
Proposed as a Blocker for 22-final by Fedora user jamespatterson using the blocker tracking app because:

 A system installed with a btrfs filesystem, raid-1, and encrypted is unbootable.

This is due to a bug with the swap partition, and due to a newline in the grub config: https://bugzilla.redhat.com/show_bug.cgi?id=1214441

Comment 4 David Shea 2015-04-24 13:28:48 UTC
Please post the logs from the install, available in /tmp in the installation environment and copied to /var/log/anaconda in the installed system.

Comment 5 James Patterson 2015-04-24 14:12:43 UTC
How can I do that? The machine won't boot. The only way around it is to deselect the "encrypt" checkbox for the swap partition with a reinstall.

Comment 6 David Shea 2015-04-24 14:17:00 UTC
Can you get the filesystem when you boot the installer in rescue mode?

Comment 7 James Patterson 2015-04-24 15:04:24 UTC
No because rescue mode can't be reached until swapon is run.

I don't have the original machine any more, but I setup a kvm machine. The grub boot config is wrong.

It says linux16 /vmli.. root=/dev/mapper/luks-xxx LANG=en_US.UTF-8\n
/dev/mapper/luks-... ro rootflags=...
etc.

You need to not have that \n
and not have that /dev/mapper/luks-... part and I guess it will boot.

Unfortunately I won't have access to this kvm until Monday now, but it seems to solve the problem.

Comment 8 James Patterson 2015-04-24 15:08:52 UTC
Yes it looks like it will boot now. So removing that dangling /dev/mapper/luks-... part which is not the value to some parameter seems to solve the problem. I can post logs Monday.

Comment 9 James Patterson 2015-04-27 07:29:02 UTC
Created attachment 1019254 [details]
/var/log/anaconda from kvm

Comment 10 Vratislav Podzimek 2015-04-27 12:52:44 UTC
Could you please try to reproduce the issue again? I've just run a new F22 installation with Btrfs+swap on top of LUKS and everything went fine. No dangling stuff in grub.cfg, system boots.

Comment 11 James Patterson 2015-04-27 13:18:24 UTC
Raid-1 or not raid-1? (Did you use my kickstart config?)

I've reproduced this twice on real metal and once in a VM. I'd rather not install again. In all cases I installed using the full iso, not netinstall.

Comment 12 David Lehman 2015-04-27 17:02:55 UTC
I just did an install with F22-Beta-TC5 with encrypted btrfs raid1 and encrypted swap. It booted successfully.

If you have a kickstart you should attach it.

Comment 13 James Patterson 2015-04-27 18:18:42 UTC
Aha there might be the problem then: I installed the later release, Fedora-Live-Workstation-x86_64-22_Beta-3.iso

I'll do a fourth install tomorrow and get a kickstart file.

Comment 14 Chris Murphy 2015-04-27 19:23:21 UTC
Is reproducible with:
Fedora-Live-Workstation-x86_64-22-20150427.iso
anaconda-22.20.9-1
python-blivet-1.0.7-1

The problem is as described in comment 7, there is a break after LANG=en_US.UTF-8 which means /dev/mapper/luks... is on a new line, so root= isn't defined, and possibly the initramfs isn't defined either, so I end up with a kernel panic. But I'm going to bet dollars to donuts this is either a grub2-mkconfig or grubby bug.

Comment 15 Chris Murphy 2015-04-27 19:26:19 UTC
Created attachment 1019416 [details]
screenshot, grub edit

Comment 16 Chris Murphy 2015-04-27 19:27:43 UTC
Created attachment 1019417 [details]
screenshot, grub can't find command

Comment 17 Chris Murphy 2015-04-27 19:31:52 UTC
OK cute, there's no initrd16 line in my grub.cfg. So there's multiple problems going on here.

Comment 18 Chris Murphy 2015-04-27 19:42:35 UTC
Created attachment 1019420 [details]
anaconda-ks-cfg reproducing c14

Comment 19 Chris Murphy 2015-04-27 19:43:58 UTC
Created attachment 1019421 [details]
anaconda.log, c14

Comment 20 Chris Murphy 2015-04-27 19:44:14 UTC
Created attachment 1019422 [details]
program.log, c14

Comment 21 Chris Murphy 2015-04-27 19:44:31 UTC
Created attachment 1019423 [details]
storage.log, c14

Comment 22 Chris Murphy 2015-04-27 19:51:51 UTC
Created attachment 1019424 [details]
grub.cfg, c14

a.) "LANG=en_US.UTF-8" is added by grubby

b.) "/dev/mapper/luks-ddf0ce3a-b6b8-4d77-93bf-cdd7d1f4d2a0" is unqualified and makes zero sense there all by itself; probably added by grubby because the rescue entry doesn't contain this, and may be lead by the CR that's causing this bug.

c.) Lacks initrd16 line.

d.) If I use grub2-mkconfig to create a new grub.cfg, things work fine (the system boots and has none of the above anomalies).

So in any case this looks like a grubby bug.

Comment 23 Chris Murphy 2015-04-27 20:04:26 UTC
Nope, not strictly a grubby bug. After fixing grub.cfg with grub2.mkconfig and then installing a new kernel from koji, grubby creates a proper working entry that boots the system. So my current guess it that there's something about the install environment that causes grub2-mkconfig to write out a bad grub.cfg which then confuses grubby.

Comment 24 James Patterson 2015-04-27 20:17:09 UTC
I'm betting it's btrfs related: btrfs handles the raid-1 by itself.

Comment 25 Chris Murphy 2015-04-27 22:10:09 UTC
The missing initrd line always happens, see new bug 1215839.

This btrfs/luks bug with the bogus <CR> between LANG=en_US.UTF-8\n
/dev/mapper/luks-... ro rootflags= requires more investigation.

Comment 26 Chris Murphy 2015-04-28 02:50:40 UTC
100% reproducible with today's, 22 beta TC4, and Fedora 21 final builds (lives):
1. Boot install media, launch installer.
2. Installation destination: select 2x drives, custom partition, encrypt
3. Click blue automatically create link.
4. optionally delete swap mount point
5. Click / mount point, Volume>Modify, RAID Level to RAID 1
6. Done
7. Passphrase.
8. Begin installation
9. optionally do not enter root or user password, makes it possible to see the grub.cfg state prior to grubby modification.

Observations:
a. The grub.cfg is already malformed before grubby touches it;

b. It's malformed in a couple of ways. One, it uses root=/dev/blockdevice notation instead of root=UUID=<volumeUUID> notation. Notice that the rescue entries correctly uses root=UUID=<volumeUUID> notation, and does not try to define root= with two LUKS block devices.

c. Ultimately it's the CR in between the two LUKS block devices that causes the dangling line between linux16 (and the missing initrd16 line); that bogus dangling line contains rootflags=subvol=root which is necessary for proper mounting of the Btrfs subvolume and is why boot fails.

The question is why the primary kernel entry isn't created using root=UUID=<volumeUUID> notation like the rescue entry; or when rerunning grub2-mkconfig after the initramfs is created (after step 9 above, entering a password and clicking Finish configuration). 

If grub2-mkconfig should use root=UUID=<volumeUUID> absent an initramfs, then this bug is a grub2 bug. If grub2-mkconfig as shipped by Fedora isn't expected to be run prior to the existence of an initramfs, then this is an anaconda bug.

In any case it does appear to be a blocker, which could be fixed by anaconda re-running grub2-mkconfig a 2nd time after the initramfs is created.

Comment 27 Chris Murphy 2015-04-28 03:41:39 UTC
Created attachment 1019478 [details]
debug grub2-mkconfig, without initramfs, c26

/boot/initramfs-3.17.4-301.fc21.x86_64.img does NOT exist when running
# bash -x grub2-mkconfig
Also changed 10_linux to set -exv

Comment 28 Chris Murphy 2015-04-28 03:42:23 UTC
Created attachment 1019479 [details]
debug grub2-mkconfig, with initramfs, c26

/boot/initramfs-3.17.4-301.fc21.x86_64.img does exist when running
# bash -x grub2-mkconfig
Also changed 10_linux to set -exv

Comment 29 Chris Murphy 2015-04-28 05:18:39 UTC
/etc/grub.d/10_linux
    # "UUID=" and "ZFS=" magic is parsed by initrd or initramfs.  Since there's
    # no initrd or builtin initramfs, it can't work here.

That explains the reason grub2-mkconfig tries to use two roots; the initramfs is missing.

And I've reproduced this behavior without LUKS. Again, must be live media, the problem doesn't happen with netinstall.

Btrfs raid1 (no swap, no encrypt) has the same boot failure; the grub.cfg linux16 line:

	linux16 /vmlinuz-4.0.0-1.fc22.x86_64 root=/dev/vda2
/dev/vdb1 ro rootflags=subvol=root rhgb quiet

First failure is GRUB complains:
error: can't find command /dev/vdb1

Second failure is systemd Failed to Switch Root, detail "Failed to switch root: Specified switch root path /sysroot does not seem to be an OS tree. os-release file is missing." And inside /sysroot are the home and root subvolumes, because the rootflags=subvol=root is on a bogus separate line that isn't understood

Fails exactly the same way in the 20150427 build, 22 Beta TC4, Fedora 21, and Fedora 20. It did work in Fedora 19. (Again, all lives only tested.)

Comment 30 David Gay 2015-04-28 20:59:38 UTC
Discussed at the 2015-04-28 blocker review meeting.[1] Voted as RejectedBlocker.

Btrfs RAID is a fairly uncommon configuration, and there is a simple workaround: using the non-live installer. We believe that documenting this workaround is sufficient to deal with this case.

[1]: http://meetbot.fedoraproject.org/fedora-blocker-review/2015-04-28/

Comment 31 David Shea 2015-12-02 22:45:52 UTC
Please include a initrd in the live image. The kernel postinstall scripts generate one, but it is removed in spin-kickstarts.

The problem is that grub2-mkconfig checks if a initrd is going to be used for boot, and based on that determines whether the root= parameter should be based on UUID or device name. This bit here, in 10_linux:

  if test -n "${initrd}" ; then
    gettext_printf "Found initrd image: %s\n" "${dirname}/${initrd}" >&2
  elif test -z "${initramfs}" ; then
    # "UUID=" and "ZFS=" magic is parsed by initrd or initramfs.  Since there's
    # no initrd or builtin initramfs, it can't work here.
    linux_root_device_thisversion=${GRUB_DEVICE}
  fi


In this case, $GRUB_DEVICE is actually several devices. An initrd is necessary to actually decrypt the device and set everything up anyway, but from grub's point of view there isn't going to be an initrd, because at that point an initrd does not exist. Please make an initrd exist. I know that it's going to just be regenerated anyway, but we have to at least let everything know that it's going to exist.

Comment 32 Kevin Fenzi 2015-12-05 19:45:56 UTC
commit fd4b26341c66b2b6e0db79a90351fb4d373ddab1
Author: Kevin Fenzi <kevin>
Date:   Sat Dec 5 12:44:57 2015 -0700

    Don't nuke the initramfs by default. Needed for bug 1214441

diff --git a/fedora-live-base.ks b/fedora-live-base.ks
index fb201d3..8032870 100644
--- a/fedora-live-base.ks
+++ b/fedora-live-base.ks
@@ -305,8 +305,6 @@ rm -f /var/lib/rpm/__db*
 # go ahead and pre-make the man -k cache (#455968)
 /usr/bin/mandb
 
-# save a little bit of space at least...
-rm -f /boot/initramfs*
 # make sure there aren't core files lying around
 rm -f /core*

Comment 33 David Shea 2016-03-23 21:13:51 UTC
*** Bug 1246906 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.