Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1753154 - grub2 blscfg menu order can become random
Summary: grub2 blscfg menu order can become random
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: 38
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nicolas Frayer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2019-09-18 09:19 UTC by Warren Togami
Modified: 2023-06-14 09:28 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1753157 (view as bug list)
Environment:
Last Closed: 2020-11-24 18:22:20 UTC
Type: Bug
Embargoed:
mattdm: fedora_prioritized_bug?


Attachments (Terms of Use)

Description Warren Togami 2019-09-18 09:19:53 UTC
Description of problem:
If you installed your system with Anaconda then /boot/loader/entries/*.conf is named to match /etc/machine-id.

But if you are booting a raw image, kernel install happened during image creation with a temporary random /etc/machine-id UUID, but then the file is blanked. During initial fresh boot from that image it generates a new random UUID for /etc/machine-id which no longer matches the /boot/loader/entries/*.conf filename.

It seems to boot just fine, but upon installation of a new kernel (if /etc/sysconfig/kernel is missing UPDATEDEFAULT=yes) it fails to explicitly write grubenv's saved_entry=<new kernel's full BLS name>.

The consequence of that is headless cloud or embedded boards randomly will randomly reboot into either the new or old kernel. This is because it defaults to zeroth menu entry while the /boot/loader/entries are ordered by blscfg with rpmvercmp().

https://fedorapeople.org/~wtogami/rpmvercmp3.py
$ ./rpmvercmp3.py 3a0ec5d722d8490895ed0715bcf68280 61dcccd9652d4a02b08ae324222cb5d4
61dcccd9652d4a02b08ae324222cb5d4 is newer than 3a0ec5d722d8490895ed0715bcf68280

blscfg is comparing two random UUID's exactly as intended.

Another consequence is removal of the original kernel does not delete the BLS entry file because the name does not match the current machine-id.

Version-Release number of selected component (if applicable):
grub2-efi-aa64-2.02-97.fc31.aarch64
appliance-tools-009.0-7.fc31.noarch
systemd-udev-243-1.fc31.aarch64

Mitigation:
Image creators like appliance-tools and imagefactory should probably write out /etc/sysconfig/kernel. Only after kernel-install is run again does reboot behavior become closer to user expectations. But this only bypasses the random menu ordering, it still needs to be fixed.

Possible Fixes:
* Stop including the machine-id in the /boot/loader/entries/*.conf filenames.
* Images could include a one-time script that runs during initial boot. After /etc/machine-id is written the filename in /boot/loader/entries/ can be renamed to match.

RHEL8 also needs to be fixed. BLS Cloud boot can behave in unexpected ways as headless machines can't show the boot menu to the user. This can be very confusing as reboot and grub2-reboot do not do what you expect. It could also prevent a system from rebooting into a new kernel containing a security patch.

Comment 1 Warren Togami 2019-09-23 19:14:45 UTC
#!/bin/bash

# https://bugzilla.redhat.com/show_bug.cgi?id=1753154
# /boot/loader/entries/*.conf can be incorrect because machine-id is 
#   set only after first boot when installed from an image.
# This script renames *.conf files to match /etc/machine-id.

MACHINEID=$(cat /etc/machine-id)
cd /boot/loader/entries
for filename in $(find -type f -name '*.conf'); do
  FILEID=$(echo "$filename" | sed -r 's/^\.\/([0-9a-f]{32})-.+.conf$/\1/')
  REMAIN=$(echo "$filename" | sed -r 's/^\.\/[0-9a-f]{32}-(.+.conf)$/\1/')
  if [ "$MACHINEID" != "$FILEID" ]; then
    echo "Renaming ${FILEID}-${REMAIN} to ${MACHINEID}-${REMAIN} to workaround rhbz #1753154"
    mv ${FILEID}-${REMAIN} ${MACHINEID}-${REMAIN}
  fi
done

Comment 2 Warren Togami 2019-09-25 21:49:40 UTC
https://github.com/wtogami/example-efi-blscfg-imagecreator/

Here are example kickstart snippets for images that workaround this bug. This works equally on Fedora 31 and CentOS 8.

%post
### script - fixup-bls-entry-name
cat << EOF > /usr/bin/fixup-bls-entry-name
#!/bin/bash

# https://bugzilla.redhat.com/show_bug.cgi?id=1753154
# /boot/loader/entries/*.conf can be incorrect because machine-id is
#   set only after first boot when installed from an image.
# This script renames *.conf files to match /etc/machine-id.

MACHINEID=\$(cat /etc/machine-id)
cd /boot/loader/entries
for filename in \$(find -type f -name '*.conf'); do
  FILEID=\$(echo "\$filename" | sed -r 's/^\.\/([0-9a-f]{32})-.+.conf\$/\1/')
  REMAIN=\$(echo "\$filename" | sed -r 's/^\.\/[0-9a-f]{32}-(.+.conf)\$/\1/')
  if [ "\$MACHINEID" != "\$FILEID" ]; then
    echo "Renaming \${FILEID}-\${REMAIN} to \${MACHINEID}-\${REMAIN} to workaround rhbz #1753154"
    mv \${FILEID}-\${REMAIN} \${MACHINEID}-\${REMAIN}
  fi
done
EOF
chmod 755 /usr/bin/fixup-bls-entry-name

### run-once@.service
cat << EOF > '/etc/systemd/system/run-once@.service'
[Unit]
DefaultDependencies=yes

[Service]
Type=exec
ExecStart=-/usr/bin/%i
ExecStartPost=-/usr/bin/systemctl disable run-once@%i.service
Restart=no
StandardOutput=journal+console
StandardError=journal+console

[Install]
WantedBy=sysinit.target
EOF
chmod 644 '/etc/systemd/system/run-once@.service'

### run-once during boot: fixup BLS entry names to workaround https://bugzilla.redhat.com/show_bug.cgi?id=1753154
systemctl enable run-once
%end

Comment 3 Warren Togami 2019-09-27 13:16:59 UTC
Fixed race condition.

https://github.com/wtogami/example-efi-blscfg-imagecreator/commit/6b1e3caa3654f3857118316bce1435297ae17a0c

### run-once@.service
cat << EOF > '/etc/systemd/system/run-once@.service'
[Unit]
DefaultDependencies=no
After=local-fs.target systemd-machine-id-commit.service
Before=sysinit.target shutdown.target
Conflicts=shutdown.target

[Service]
Type=oneshot
ExecStart=-/usr/bin/%i
ExecStartPost=-/usr/bin/systemctl disable run-once@%i.service
StandardOutput=journal+console
StandardError=journal+console

[Install]
WantedBy=sysinit.target
EOF
chmod 644 '/etc/systemd/system/run-once@.service'

Comment 4 Ben Cotton 2020-11-03 15:34:20 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Ben Cotton 2020-11-24 18:22:20 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 6 Ben Cotton 2021-02-09 15:12:34 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 7 Fedora Admin user for bugzilla script actions 2021-05-07 00:34:56 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 8 Ben Cotton 2022-02-08 21:09:19 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 9 Fedora Admin user for bugzilla script actions 2023-02-02 00:13:16 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 10 Fedora Admin user for bugzilla script actions 2023-04-25 00:10:47 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 11 Ben Cotton 2023-04-25 16:39:31 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 12 Warren Togami 2023-04-29 00:08:58 UTC
This is still an issue as of Fedora 38.

Comment 13 Warren Togami 2023-04-29 00:47:02 UTC
The above workaround we've been using in our production images for years. It still works for EL9.

Comment 14 Warren Togami 2023-04-29 00:47:28 UTC
The above workaround we've been using in our production images for years. It still works for EL9.


Note You need to log in before you can comment on or make changes to this bug.