Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1945901 - boot hang with message [DEPEND] Dependency failed for Cryptography Setup for luks-XXXX...
Summary: boot hang with message [DEPEND] Dependency failed for Cryptography Setup for ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 34
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-02 15:35 UTC by Tim Rhodes
Modified: 2021-04-13 01:34 UTC (History)
9 users (show)

Fixed In Version: dracut-053-2.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-13 01:34:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
rdsosreport.txt (247.96 KB, text/plain)
2021-04-02 20:07 UTC, Tim Rhodes
no flags Details
journalctl -b -o short-monotonic --no-hostname > works-journal.log from good boot (549.00 KB, text/plain)
2021-04-02 23:18 UTC, Tim Rhodes
no flags Details
dmesg from good boot (80.38 KB, text/plain)
2021-04-02 23:19 UTC, Tim Rhodes
no flags Details
journalctl -b -o short-monotonic --no-hostname > works-journal.log from good boot with rd.debug (2.76 MB, text/plain)
2021-04-02 23:33 UTC, Tim Rhodes
no flags Details
dmesg from good boot with rd.debug (106.32 KB, text/plain)
2021-04-02 23:33 UTC, Tim Rhodes
no flags Details
rdsosreport.txt from failed boot with rd.debug (4.69 MB, text/plain)
2021-04-03 00:24 UTC, Tim Rhodes
no flags Details
journalctl -b -o short-monotonic --no-hostname > fails-journal.log from failed boot in emergency mode shell (4.40 MB, text/plain)
2021-04-03 00:38 UTC, Tim Rhodes
no flags Details
journalctl -b -o short-monotnic --no-hostname with systemd.log_level=debug udev.log-priority=debug (1.83 MB, text/plain)
2021-04-03 01:45 UTC, Tim Rhodes
no flags Details
ls -l /dev in emergency shell (10.22 KB, text/plain)
2021-04-03 11:23 UTC, Tim Rhodes
no flags Details
blkid -k in emergency shell (735 bytes, text/plain)
2021-04-03 11:24 UTC, Tim Rhodes
no flags Details
blkid in emergency shell (670 bytes, text/plain)
2021-04-03 11:24 UTC, Tim Rhodes
no flags Details
mount attempt in emergency shell (113 bytes, text/plain)
2021-04-03 11:25 UTC, Tim Rhodes
no flags Details

Description Tim Rhodes 2021-04-02 15:35:30 UTC
Description of problem:
After dnf update (on 4/2/2021), boot hangs will message [DEPEND] Dependency failed for Cryptography Setup for luks-XXXX...

Version-Release number of selected component (if applicable):
Fedora 34 RC

How reproducible:
Possibly impacting any systems with LUKS encrpypted file systems that have been patched up to 4/2/2021 from Fedora 34 repos.  Steps to reproduce are based on my configuration

Steps to Reproduce:
1. LUKS encrypted file systems (originally set up on Fedora 32)
2. Installed F34 RC on system and encrypted file systems mounted properly duing boot
3. Patched system on 4/2/2021
4. Boot hangs with message included above, after timeouts, fails to recovery shell, NOTE: system does not prompt for encryption password

Actual results:
System fails to recovery shell on boot

Expected results:
System boot proceeds and prompts for encryption password

Additional info:
This worked with the base install and still works and boots when I select the orignal grub boot option/kernel.  The original kernel is 5.11.3-300.fc34.x86_64.  The new kernel that hangs on boot is 5.11.11-300.fc34.x86_64.  This could be a kernel issue or crypto packages, but the failure seems to occur duing the initrd/dracut processing.

Comment 1 Davide Cavalca 2021-04-02 16:14:15 UTC
crypto is just a library and is unrelated to luks. Tentatively reassigning to dracut based on the information provided.

Comment 2 Chris Murphy 2021-04-02 19:27:27 UTC
>System fails to recovery shell on boot

Can you attach the /run/initramfs/rdsosreport.txt that dracut produced when it ran into this problem? This might be a release blocking bug.

Comment 3 Tim Rhodes 2021-04-02 20:07:17 UTC
Created attachment 1768646 [details]
rdsosreport.txt

/run/initramfs/rdsosreport.txt

Comment 4 Chris Murphy 2021-04-02 21:11:15 UTC
OK so the layout looks like it's LVM+LUKS, and it's getting stuck on the encrypted swap LV. But it's both systemd and dracut getting hung up on not finding it:

[   92.033709] fedora systemd[1]: dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device: Job dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device/start timed out.
[   92.034574] fedora systemd[1]: Timed out waiting for device /dev/disk/by-uuid/3355902c-b6d5-48e0-96b8-00f4f6df8930.
[   92.034707] fedora systemd[1]: Dependency failed for Cryptography Setup for luks-3355902c-b6d5-48e0-96b8-00f4f6df8930.

and

[  222.757973] fedora dracut-initqueue[534]: [ -e /dev/disk/by-id/dm-uuid-CRYPT-LUKS?-*3355902cb6d548e096b800f4f6df8930*-* ] || exit 1"

So what does this look like in the *working* case? Can you boot a working menu entry from GRUB, adding the rd.debug boot parameter, boot normally, and then attach a journal? Attach all of it, or snip off anything after Switching root, doesn't matter either way.

journalctl -b -o short-monotonic --no-hostname > works-journal.log

And then attach the log to this bug. Thanks!

Comment 5 Tim Rhodes 2021-04-02 23:18:44 UTC
Created attachment 1768673 [details]
journalctl -b -o short-monotonic --no-hostname > works-journal.log from good boot

Log from good boot

Comment 6 Tim Rhodes 2021-04-02 23:19:20 UTC
Created attachment 1768674 [details]
dmesg from good boot

dmesg fron good boot

Comment 7 Tim Rhodes 2021-04-02 23:27:50 UTC
Missed the rd.debug, collecting that now.

Comment 8 Tim Rhodes 2021-04-02 23:33:00 UTC
Created attachment 1768675 [details]
journalctl -b -o short-monotonic --no-hostname > works-journal.log from good boot with rd.debug

journalctl -b -o short-monotonic --no-hostname > works-journal.log from good boot with rd.debug

Comment 9 Tim Rhodes 2021-04-02 23:33:36 UTC
Created attachment 1768676 [details]
dmesg from good boot with rd.debug

dmesg from good boot with rd.debug

Comment 10 Chris Murphy 2021-04-02 23:47:34 UTC
From the failed boot rdsosreport:

This huge time gap is suspicious.

[    4.554627] fedora kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk
[   92.033709] fedora systemd[1]: dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device: Job dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device/start timed out.

None of the LV's are active. Since the LV's are encrypted, cryptsetup fails to unlock because crypto devices aren't found. Somehow though, blkid sees the LV's (as active) and the fact they have LUKS. 

[   92.191475] fedora dracut-initqueue[581]: Scanning devices nvme0n1p2 nvme0n1p3  for LVM logical volumes sysvg/root sysvg/swap
[   92.260545] fedora dracut-initqueue[606]: inactive '/dev/sysvg/root' [40.00 GiB] inherit
[   92.260881] fedora dracut-initqueue[606]: inactive '/dev/sysvg/swap' [<7.61 GiB] inherit
[   92.260881] fedora dracut-initqueue[606]: inactive '/dev/homevg/home' [400.00 GiB] inherit
[  222.754905] fedora dracut-initqueue[534]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[  222.757973] fedora dracut-initqueue[534]: Warning: /lib/dracut/hooks/initqueue/finished/90-crypt.sh: "[ -e /dev/disk/by-id/dm-uuid-CRYPT-LUKS?-*589424836a714b8b933996c540823811*-* ] || exit 1

Comment 11 Tim Rhodes 2021-04-02 23:58:19 UTC
On the failed boot, there are stall in the console message output, there a 1m30s start job that times out, then it reaches a message Starting dracut initqueue hook..., another stall, then a recursion of messages (100's of messages) before it fails to the recovery shell.

Comment 12 Chris Murphy 2021-04-03 00:04:57 UTC
Ok in the working case we see it get activated:

[    5.242944] dracut-initqueue[592]: + lvm lvchange --yes -K -ay --ignorelockingfailure --ignoremonitoring --poll n sysvg/root

I'd like to see the failing boot with rd.debug as well because the rdsosreport doesn't have this level of detail. That is, please do

journalctl -b -o short-monotonic --no-hostname > fails-journal.log  ##from failed boot, with rd.debug parameter.

Comment 13 Tim Rhodes 2021-04-03 00:24:00 UTC
Created attachment 1768678 [details]
rdsosreport.txt from failed boot with rd.debug

rdsosreport.txt from failed boot with rd.debug

Comment 14 Tim Rhodes 2021-04-03 00:26:04 UTC
I did not note and attempt the journalctl in the recovery shell.  I'll try that again.

Comment 15 Tim Rhodes 2021-04-03 00:38:20 UTC
Created attachment 1768690 [details]
journalctl -b -o short-monotonic --no-hostname > fails-journal.log from failed boot in emergency mode shell

journalctl -b -o short-monotonic --no-hostname > fails-journal.log from failed boot in emergency mode shell

Comment 16 Chris Murphy 2021-04-03 01:18:28 UTC
Yeah in the fails case it's super late:

[   92.334735] dracut-initqueue[584]: + lvm lvchange --yes -K -ay --ignorelockingfailure --ignoremonitoring --poll n sysvg/root

And there is still a huge gap with nothing in the log:

[   12.768554] systemd-udevd[402]: Worker [425] exited
[   91.947152] systemd[1]: dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device: Job dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device/start timed out.


Another possible factor is this is systemd or a negative interaction between systemd(-udev) and dracut because dracut isn't the only thing that changes in the failing case:


The working case has an older version of systemd in the initramfs.

[    1.864220] systemd[1]: systemd v248~rc2-6.fc34 running in system mode.

The failing case has the release version of systemd in the initramfs.

[    1.882006] systemd[1]: systemd v248-1.fc34 running in system mode.


Can you retry the failing case, instead of rd.debug, use both systemd.log_level=debug udev.log-priority=debug

And name it something like

journalctl -b -o short-monotonic --no-hostname > fails-moredebug-journal.log

Comment 17 Tim Rhodes 2021-04-03 01:45:45 UTC
Created attachment 1768694 [details]
journalctl -b -o short-monotnic --no-hostname  with  systemd.log_level=debug udev.log-priority=debug

journalctl -b -o short-monotnic --no-hostname  with  systemd.log_level=debug udev.log-priority=debug

This is a smaller file from the prev data collection.  Here's what /proc/cmdline looked like in the emregency shell...

BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.11.11-300.fc34.x86_64 root=UUID=5ae50f89-3dda-4dee-a085-b7307a61e67f ro resume=/dev/mapper/luks-3355902c-b6d5-48e0-96b8-00f4f6df8930 rd.luks.uuid=luks-58942483-6a71-4b8b-9339-96c540823811 rd.lvm.lv=sysvg/root rd.luks.uuid=luks-3355902c-b6d5-48e0-96b8-00f4f6df8930 rd.lvm.lv=sysvg/swap rhgb quiet systemd.log_level=debug udev.log-priority=debug

Comment 18 Tim Rhodes 2021-04-03 01:47:15 UTC
Need to break for the night, I'll check back tomorrow AM.

Comment 19 Chris Murphy 2021-04-03 02:26:51 UTC
[    9.091335] systemd-udevd[406]: Worker [445] exited
[   92.118585] systemd[1]: dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device: Job dev-disk-by\x2duuid-3355902c\x2db6d5\x2d48e0\x2d96b8\x2d00f4f6df8930.device/start timed out.

I'm beginning to think this may be a systemd bug.

Comment 20 Chris Murphy 2021-04-03 06:39:05 UTC
What happens if you pick the failing boot entry but modify to remove both 'rhgb quiet' parameters? Do you get a 'please enter passphrase for disk luks...." line? And if you enter the luks passphrase here, does boot proceed?

Comment 21 Tim Rhodes 2021-04-03 11:23:12 UTC
Removed rhgb and quiet, but it does not prompt for the decryption key.

I unpacked the working and failing initramfs and have been reviewing contents.  It seems the password prompt is in ./usr/lib/dracut-crypt-lib.sh.  It's not prompting for password, so not calling the function in the script or not reaching/running this script.

I started wondering about blkid determining the devices.  While in the energency shell I ran a blkid -k and it supports crypto_LUKS in the dracut version of this executable.  While in the emergency shell I collected an 'ls -l /dev' and 'blkid -k' output.  These files are uploaded to the case.  I ran 'blkid' by itself and found that it reports both the device parition and the device mapper device.  I tried mounting the /dev/mapper device and it fails.  The output is also loaded to this case.  I did not think at the time and do not have a 'ls -l /dev/mapper'. 

I'm tied up, away from the system until 10AM EDT or so, but will check back when I return.

Comment 22 Tim Rhodes 2021-04-03 11:23:52 UTC
Created attachment 1768791 [details]
ls -l /dev in emergency shell

ls -l /dev in emergency shell

Comment 23 Tim Rhodes 2021-04-03 11:24:31 UTC
Created attachment 1768792 [details]
blkid -k in emergency shell

blkid -k in emergency shell

Comment 24 Tim Rhodes 2021-04-03 11:24:57 UTC
Created attachment 1768793 [details]
blkid in emergency shell

blkid in emergency shell

Comment 25 Tim Rhodes 2021-04-03 11:25:43 UTC
Created attachment 1768794 [details]
mount attempt in emergency shell

mount attempt in emergency shell

Comment 26 Tim Rhodes 2021-04-03 17:10:42 UTC
A few mote notes, I got back into the failed boot emergency shell.  The ls -l /dev/mapper looks like the correct entries, like my system booted with the original kernel.  lsmod shows that dm_crypt is not loaded.  I modprobe'd dm_crypt and it loaded and attempted to mount /dev/mapper/sysvg-root and still got mount: /mnt: unknown filesystem type 'crypto_LUKS'.  On my running system, lsmod shows dm_crypt used by count is 3, but in the emergency shell it's 0.  Googling, showed various approaches of use cryptsetup to manually mount a LUKS encrypted volume, but cryptsetup is not in the emergency shell environment.

Comment 27 Chris Murphy 2021-04-04 02:01:41 UTC
OK I'm thinking this bug is the same as 1945596. The layout is LVM->LUKS in both cases, and it looks to me like none of the LVM LV's get activated, thus crypto device(s) never appear and then we end up with:

[   91.643732] systemd[1]: Dependency failed for Cryptography Setup for luks-f4a5c59c-b893-4b89-af80-20c2f4360103.

Comment 28 Chris Murphy 2021-04-04 02:03:36 UTC
Sorry, bad paste in c27

[   92.120078] systemd[1]: Dependency failed for Cryptography Setup for luks-3355902c-b6d5-48e0-96b8-00f4f6df8930.

Comment 29 Chris Murphy 2021-04-04 02:20:36 UTC
If you get a chance and are willing to take the risk, can you try making the changes in
https://github.com/dracutdevs/dracut/commit/7fcc4955884355c3943d6c41e459b4b983a818f4

And see if that fixes the problem? You would need to rebuild+replace the *bad* initramfs for this change to get incorporated.

Comment 30 Chris Murphy 2021-04-04 04:28:00 UTC
OK never mind, as figured out in bug 1945596, this is the fix.

https://github.com/dracutdevs/dracut/commit/ba4bcf5f4f11ad624c647ddf4f566997186135e7

Comment 31 Tim Rhodes 2021-04-04 12:31:54 UTC
That is the fix.  I modified /usr/lib/dracut/modules.d/35network-manager/nm-run.service with the change in the dracutdevs project and regnerated my initramfs using dracut --regenerate-all --force and the current/latest kernel (5.11.11-300.fc34.x86_64) works.  Thank you for you effort and attention.

Comment 32 Pekka Korkki 2021-04-06 15:27:57 UTC
Comment 30 fixed booting for me as well.

Comment 33 Andrey 2021-04-09 06:55:22 UTC
Same bug. Fixed with comment 30.

Comment 34 Fedora Update System 2021-04-09 17:25:19 UTC
FEDORA-2021-50707f8501 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-50707f8501

Comment 35 Fedora Update System 2021-04-13 01:34:27 UTC
FEDORA-2021-50707f8501 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.