Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1779611
Summary: | System fails to boot after upgrade to grub2-efi-x64-1:2.02-103.fc31.x86_64 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tim Cuthbertson <tim> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 31 | CC: | airlied, amdunn, amitosh.swain, bskeggs, carl, dc.hart, fmartine, glavposhtamt, hdegoede, ichavero, itamar, jan.public, jarodwilson, javierm, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, laget, leblondthi, linville, lkundrak, loic.yhuel, ltuan, masami256, mchehab, miguel.horlle, mihai, mjg59, pjones, rafaeltscs, rickhg12hs, sgoel01, steved | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | grub2-2.02-104.fc31 kernel-5.6.13-200.fc31 kernel-5.6.13-300.fc32 kernel-5.6.13-100.fc30 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-05-20 03:15:05 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Tim Cuthbertson
2019-12-04 11:22:52 UTC
By setting debug=all you can get debug output. For that go to a GRUB prompt by pressing the c key when you get into the boot menu and then execute: grub> set debug=all And then press Esc to get again into the boot menu. Alternatively you can set this in your gruenv file from user-space with: $ grub2-editenv - set debug=all Can you also check if the entries in the boot menu are correctly generated? You can do this by pressing the e key and check if the kernel, initramfs and command line parameters are correct. Created attachment 1642258 [details]
debug=all grub output
Thanks for the tip. I've attached the end of the output (from debug=all) as a literal screenshot - it gets here and then no further. Nothing seems alarming about this (or the rest of the output), though I don't know exactly what I'm looking for. I did the same for the version that correctly boots, and couldn't spot anything different. I couldn't quite check the end though since it boots immediately upon reaching EOF. I checked the paths in the boot entry - for both versions, the paths are the same, and they point to valid paths. So I don't think that's the problem. (In reply to Tim Cuthbertson from comment #3) > Thanks for the tip. I've attached the end of the output (from debug=all) as > a literal screenshot - it gets here and then no further. > > Nothing seems alarming about this (or the rest of the output), though I > don't know exactly what I'm looking for. I did the same for the version that > correctly boots, and couldn't spot anything different. I couldn't quite > check the end though since it boots immediately upon reaching EOF. > > I checked the paths in the boot entry - for both versions, the paths are the > same, and they point to valid paths. So I don't think that's the problem. Did you check if your kernel command line params was correct? (by pressing 'e' for any entry in the boot menu) (In reply to Tim Cuthbertson from comment #0) [snip] > > The version installed with my fedora 31 initial install works, and is: > > - grub2-efi-x64-1:2.02-100.fc31.x86_64 > > (I haven't tried the `-102` version to see whether it works, as that never > made it to stable) > > Did you try it with version -101 ? FEDORA-2019-1635a1541a has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-1635a1541a I've just tried with version 102 (by extracting that single .efi file from the downloaded RPM), and that boots correctly. So I assume 101 would as well. I also see that you published a 104 on bodhi. I extracted the .efi file in the same way, but unfortunately that version does not boot. (In reply to Tim Cuthbertson from comment #7) > I've just tried with version 102 (by extracting that single .efi file from > the downloaded RPM), and that boots correctly. So I assume 101 would as well. > > I also see that you published a 104 on bodhi. I extracted the .efi file in > the same way, but unfortunately that version does not boot. That's strange. I dropped the only patch that changed anything in GRUB. The only change between 102 and 104 is a fix for the /etc/grub.d/10_reset_boot_success script (that shouldn't even affect you unless you re-generate your grub.cfg file). So I wonder if this may be an issue of a change with the toolchain or something with your firmware. Well, that's terrifying :/ Do you mean the compiler (gcc?) toolchain, or is it more complicated for grub? I'd be willing to experiment with building the grubx64.efi file with different versions of GCC, if that would help track down the issue? I am comfortable with building software in docker or nix (as in NixOS) for reproducible build environments, though I assume there's some fedora specific tools to build rpms reliably. Is it easy to vary the GCC version in use with fedora's tools, or does it implicitly depend on my system GCC? Or would it easier to get koji to do a few builds against different GCC versions and just have me try out the built results? If it's firmware-related, I assume that couldn't be a _change_ in my firmware right? As in, swapping out different versions of grubx64.efi wouldn't have any effect on my firmware would it? grub2-2.02-87.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-69da274284 grub2-2.02-104.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-1635a1541a (for the record, I already tried out the above version with no success) If it is a toolchain change, I assume that would have to be something in the diff of the buildroots used in the respective builds? I checked out the x86_64 build `root.log` from koji's 102 and 103 builds, respectively: https://kojipkgs.fedoraproject.org//packages/grub2/2.02/102.fc31/data/logs/x86_64/root.log https://kojipkgs.fedoraproject.org//packages/grub2/2.02/103.fc31/data/logs/x86_64/root.log That has a package installation list, and `diff` shows that the only differing package installed is `python-pip-wheel`, which changed from 19.1.1-5.fc31 to 19.1.1-6.fc31. That seems super unlikely to be the culprit. Is there any other possible difference between the builds? grub2-2.02-104.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report. reopening (In reply to Tim Cuthbertson from comment #14) > reopening Thanks, this bug is mentioned in the update and that's why the Fedora Update System though that was fixed. I'm very puzzled by your issue. Answering your question about building different grub2 versions, you could use the following commands: $ fedpkg clone --anonymous grub2 && pushd grub2 $ fedpkg switch-branch f31 $ fedpkg mockbuild $ rpm2cpio results_grub2/2.02/104.fc31/grub2-efi-x64-2.02-104.fc31.x86_64.rpm | cpio -idmv $ cp boot/efi/EFI/fedora/grubx64.efi /boot/efi/EFI/fedora/ Well, this just makes things more puzzling :/ Thanks for your instructions, I've built the latest grub f31 locally (corresponding to the 104 patchset). It boots fine. I built the 102 and 103 patchsets too, and also both boot fine. So there's nothing wrong with the patches themselves. And to confirm my sanity, I redownloaded the .efi files from the published 103 and 104 RPMs, and checked their both of those (still) fail to boot. In a separate experiment, since I'm pretty handy with nix, I set up nix expressions to duplicate as much as I could inside a reproducible nix build. That is, taking the nixpkgs grub2 expression (https://github.com/NixOS/nixpkgs/blob/8af07181d05e9aff10558847f92bdb9aea18d322/pkgs/tools/misc/grub/2.0x.nix), and modifying it to be based on upstream 2.02, and to include all the patches from the srpm. I built a nix version of the 102 and 103 patchsets, but unfortunately neither booted. Surprisingly a version based on `master` (of https://src.fedoraproject.org/rpms/grub2.git, which corresponds to rawhide I guess?) does boot. This could definitely be a red herring, there are probably plenty of ways the fedpkg and nix builds differ, and this could be due to some bug which would never manifest in a fedpkg build. But it _is_ at least reproducible. I'm guessing the main difference is that it's been rebased on grub 2.04. So for one last check, I downloaded f32 versions -5 and -6 (corresponding to f31 -103 and -104, I assume). These both booted successfully. So to summarize: - official fedora builds of 102: SUCCESS - official fedora builds of 103, 104: FAILED - local (fedpkg) builds of 102, 103 and 104: SUCCESS - nix builds of 102, 103: FAILED - nix build of HEAD (rawhide?): SUCCESS - official fedora f32 versions 5 and 6: SUCCESS So it seems like it's time to give up, and that my best strategy is to just not update grub, and hope that when f32 rolls around the grub 2.04 versions still work for my machine and I can resume updates. Thanks for all your help, feel free to re-apply the 103 patchset since that clearly wasn't the actual issue. Thanks a lot for doing all the testing. You could try to compare the binaries from the official -103 and your own -103 build to check if there's any difference that could explain your boot issue. Interesting we also have bug 1769063 open about 5.3 kernels not booting on a Dell Inspiron 5567. The debugging done there also points to grub in a way. It seems that any grub version + a 5.2 kernel works fine, where as a 5.3 kernel will only boot with the F31 gold/release grub version. After installing F31 the system boots fine, then after a "dnf update" the system no longer boots unless a 5.2 kernel is selected. Tim, can you try installing a 5.2 kernel, say: https://koji.fedoraproject.org/koji/buildinfo?buildID=1393085 see: https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt or some instructions for how to install a kernel directly from koji. I wonder if that kernel does boot with one of the "FAIL" grub versions. Interesting. I tried that with the -104 version of grub. Which previously failed, but does succeed when booting the kernel 5.2.18 version installed from that link. So yes, this looks like the same issue. I too am on a Dell Inspiron: pew description: Desktop Computer product: Inspiron 3668 (0763) vendor: Dell Inc. serial: H7RXFL2 width: 64 bits capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32 configuration: boot=normal chassis=desktop family=Inspiron sku=0763 uuid=44454C4C-3700-1052-8058-C8C04F464C32 *-core description: Motherboard product: 07KY25 vendor: Dell Inc. physical id: 0 version: A00 serial: /H7RXFL2/CNWS20077Q007B/ *-firmware description: BIOS vendor: Dell Inc. physical id: 0 version: 1.3.4 date: 06/20/2017 size: 64KiB capacity: 16MiB capabilities: pci pnp upgrade shadowing cdboot bootselect edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification netboot uefi *-memory description: System Memory physical id: 9 slot: System board or motherboard size: 16GiB Javier, I forgot to reply to your last message: is there some way to compare the resulting binaries? (`diff` doesn't do anything useful). The obvious thing I notice is that the nix-built versions are significantly smaller, which may mean they're simply lacking some optional feature, or maybe debuginfo? The locally-built ones are a consistent size which is a few hundred bytes bigger than the koji-built variants, so diffing those might be more productive. -rwx------. 1 tim tim 2271560 Oct 10 13:26 100.efi* -rwx------. 1 tim tim 2271560 Oct 16 16:50 101.efi* -rwx------. 1 tim tim 2271560 Nov 26 15:53 102.efi* -rwx------. 1 tim tim 2271560 Nov 27 20:46 103.efi* -rwx------. 1 tim tim 2271560 Dec 5 20:59 104.efi* -rwx------. 1 tim tim 2500936 Dec 12 10:16 fc32-5.efi* -rwx------. 1 tim tim 2500936 Dec 12 10:17 fc32-6.efi* -rwx------. 1 tim tim 2271992 Dec 12 08:56 local-102.efi* -rwx------. 1 tim tim 2271992 Dec 12 09:39 local-103.efi* -rwx------. 1 tim tim 2271992 Dec 12 08:57 local-104.efi* -rwx------. 1 tim tim 1449984 Dec 9 21:10 nix-102.efi* -rwx------. 1 tim tim 1449984 Dec 9 21:17 nix-103.efi* -rwx------. 1 tim tim 1548288 Dec 8 17:41 nix-HEAD.efi* (I can upload them somewhere if you want to take a poke yourself...) I have see the same issue on my Dell Inspiron 3668 desktop machine running Fedora 30. No 5.3 Fedora kernel including the latest 5.3.16 will boot and ends up with a blank screen when selected from the grub menu but kernel 5.2.18 boots fine. Here is the relevant machine and package info BIOS Information Vendor: Dell Inc. Version: 1.7.0 Release Date: 01/25/2018 System Information Manufacturer: Dell Inc. Product Name: Inspiron 3668 grub2/kernel package info: grub2-common-2.02-84.fc30.noarch grub2-efi-x64-2.02-84.fc30.x86_64 grub2-pc-2.02-84.fc30.x86_64 grub2-pc-modules-2.02-84.fc30.noarch grub2-tools-2.02-84.fc30.x86_64 grub2-tools-efi-2.02-84.fc30.x86_64 grub2-tools-extra-2.02-84.fc30.x86_64 grub2-tools-minimal-2.02-84.fc30.x86_64 kernel-5.2.18-200.fc30.x86_64 kernel-5.3.11-200.fc30.x86_64 kernel-5.3.16-200.fc30.x86_64 kernel-core-5.2.18-200.fc30.x86_64 kernel-core-5.3.11-200.fc30.x86_64 kernel-core-5.3.16-200.fc30.x86_64 kernel-debug-devel-5.2.18-200.fc30.x86_64 kernel-debug-devel-5.3.11-200.fc30.x86_64 kernel-debug-devel-5.3.16-200.fc30.x86_64 kernel-devel-5.2.18-200.fc30.x86_64 kernel-devel-5.3.11-200.fc30.x86_64 kernel-devel-5.3.16-200.fc30.x86_64 kernel-headers-5.3.11-200.fc30.x86_64 kernel-modules-5.2.18-200.fc30.x86_64 kernel-modules-5.3.11-200.fc30.x86_64 kernel-modules-5.3.16-200.fc30.x86_64 kernel-modules-extra-5.2.18-200.fc30.x86_64 kernel-modules-extra-5.3.11-200.fc30.x86_64 kernel-modules-extra-5.3.16-200.fc30.x86_64 kernel-tools-5.3.9-200.fc30.x86_64 kernel-tools-libs-5.3.9-200.fc30.x86_64 Same issue with Dell Inspiron 15 Gaming 7567, except the system resets when trying to boot a kernel (tested with 5.3 and 5.4). grub2-efi-x64-2.02-100.fc31.x86_64 => OK grub2-efi-x64-2.02-104.fc31.x86_64 => reset grubx64.efi of local build of 2.02-104.fc31 using "fedpkg local" => OK The only differences between the local and the official 2.02-104 grubx64.efi are : - Security Directory size (offset 0x12C) : 0x00000AF8 (local) => 0x00000948 (official) - The Security Directory itself, starting at 0x22A000 The official version is signed by "Fedora Secure Boot Signer", while the local build uses "Red Hat Test Certificate". The grubx64.efi is almost the same in official 2.02-100 and 2.02-104 rpm, with the same signing certificate, and the same size. In my case secure boot is disabled, so the certificate shouldn't matter, shim probably doesn't check it. The fact that both official binaries have the same size should exclude any size-related bugs in shim or the UEFI. Is there a way to check whether the grubx64.efi has been correctly loaded in memory ? *** Bug 1769063 has been marked as a duplicate of this bug. *** *** Bug 1786481 has been marked as a duplicate of this bug. *** The issue seems to happen when shim loads the "bad" grub : - shim -> local-104 -> Linux => OK - shim -> official-104 -> Linux => reset - shim -> local-104 -> chainloader official-104 -> Linux => OK - shim -> official-104 -> chainloader local-104 -> Linux => reset - efi shell -> official-104 -> Linux => OK So perhaps something being corrupted in memory when shim loads the "bad" grub, which causes an issue when starting the kernel. I have been pondering two other possibilities. I am wondering if this is associated with an SSD. I am also wondering if this has something to do with the MBR which includes - for most people - a sustained Windows record. The problematic machine is now a secondary laptop. Over the weekend I want to do an sgdisk -Z /dev/sdx and then reinstall to see if that makes a difference. To that end I have created a working Linux system on a stick. I have no problem with a more current i7-based Dell 5584. On my Dell 5567 I have the exact issue with NO SDD so I don't think it's that. I do have a Windows partition as well (which the laptop came with, preinstalled with Windows) resized down before creating the Linux partitions and installing Fedora dual boot - so your second possibility could be the case. (and clearly the system is in secure boot mode, not legacy) A very similar issue reported on stackexchange (https://superuser.com/q/1514051/256629) seems to have a temporary remedy by disabling TPM. My situtation with the Dell Inspiron 3668 is exactly similar to yours in that it has an HDD and came with Windows preinstalled. I shrank the Windows partition to install Linux on the machine. My machine boots EFI but secure mode is disabled. (In reply to Andy from comment #27) > On my Dell 5567 I have the exact issue with NO SDD so I don't think it's > that. > > I do have a Windows partition as well (which the laptop came with, > preinstalled with Windows) resized down before creating the Linux partitions > and installing Fedora dual boot - so your second possibility could be the > case. > > (and clearly the system is in secure boot mode, not legacy) Can someone who sees this working with 5.2 but not 5.3+ boot with "efi=debug earlyprintk=efi,keep" and no "rhgb quiet" on the kernel command line, and try to get some photos of the boot process? This will be incredibly slow and tedious, but has a chance of showing us some useful debug information. (In reply to Peter Jones from comment #30) > Can someone who sees this working with 5.2 but not 5.3+ boot with "efi=debug > earlyprintk=efi,keep" and no "rhgb quiet" on the kernel command line, and > try to get some photos of the boot process? This will be incredibly slow > and tedious, but has a chance of showing us some useful debug information. I assume you mean the boot process of 5.3+ ? With 5.2.0-1.fc31 it boots fine, but the traces are way too fast (and I don't know if there is something on screen which isn't normal dmesg output). With 5.4.7-200.fc31, it resets without printing anything (or maybe it's too fast, perhaps someone who has a freeze instead of a reboot would see something). However, I can confirm disabling the TPM works, it's called "PTT" in my UEFI Setup. With no TPM, I can boot kernel-5.4.7-200.fc31 with the official grub2-2.02-104.fc31. Here is a "dmesg | grep -i tpm" on 5.4.5-300.fc31.x86_64 booted with TPM (and obviously the local grub build) : > efi: ACPI=0x78649000 ACPI 2.0=0x78649000 SMBIOS=0xf05e0 SMBIOS 3.0=0xf0600 TPMFinalLog=0x78c8f000 ESRT=0x79360598 MPS=0xfcbe0 TPMEventLog=0x68207018 > [Firmware Bug]: Failed to parse event in TPM Final Events Log > ACPI: TPM2 0x000000007867A9D8 000034 (v03 Tpm2Tabl 00000001 AMI 00000000) > tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80 > tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80 Note that the "[Firmware Bug]: Failed to parse event in TPM Final Events Log" isn't always present, but is there most of the time. Here is the same with 5.2.0-1.fc31.x86_64, booting from the affected grub : > efi: ACPI=0x78649000 ACPI 2.0=0x78649000 SMBIOS=0xf05e0 SMBIOS 3.0=0xf0600 ESRT=0x79360598 MPS=0xfcbe0 TPMEventLog=0x68207018 > ACPI: TPM2 0x000000007867A9D8 000034 (v03 Tpm2Tabl 00000001 AMI 00000000) > tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80 > tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80 There is no TPMFinalLog, I see this was added in 5.3 : "tpm: Reserve the TPM final events table". So perhaps the TPM Final Events log is incorrect, which might cause an issue when reading it. But what would the different grub change ? Would they trigger different TPM logs (between the official 2.02-100 and the local 2.02-104 which work, and the official 2.02-104 which has the issue) when they are booted by shim ? From the 5.2.0 boot, if that helps : > efi: mem29: [ACPI Memory NVS | | | | | | | | |WB|WT|WC|UC] range=[0x000000007867e000-0x0000000078fe0fff] (9MB) Definitely something to this TPM theory... I can confirm that my Dell Inspiron 5567 boots 5.3.16-200 correctly when the TPM/PTT is turned off with GRUB2 1:2.02-87.fc30, and silently fails to boot 5.3.x when the TPM/PTT is turned on. AAAND.... With "efi=debug earlyprintk=efi,keep" added and "rhgb quiet" removed... The ONLY output produced before the freeze when trying to boot 5.3.16-200 is: EFI stub: UEFI Secure Boot is enabled. Nothing else. I can also confirm that disabling PTT on my Dell Inspiron 3668 desktop allows kernel 5.4.7-100 to boot successfully but leaving it still results in a blank screen as with the earlier 5.3 kernels. (In reply to Shantanu Goel from comment #34) > I can also confirm that disabling PTT on my Dell Inspiron 3668 desktop > allows kernel 5.4.7-100 to boot successfully but leaving it still results in > a blank screen as with the earlier 5.3 kernel Sorry for the typo but I meant that leaving PTT enabled in the BIOS still results in a blank screen with the 5.4 kernel as it does with 5.3. I can confirm that disabling PTT on my Dell Inspiron 15 Gaming 7567 laptop allows 5.4.7-100.fc30.x86_64 to boot. Thanks! Dell Inspiron 5567: The 5.3 kernel does not load on any of the distribution kit. LiveCD doesn't even show grub. I finally got around to following these instructions myself (running with the v104 version of grubx64.efi from fc31). Unfortunately I didn't have any luck: Disbling PTT: I pressed "clear" and rebooted, I assume this is what people mean by "disable". It still didn't boot. Replacing "rhgb quiet" with "efi=debug earlyprintk=efi,keep": I didn't see any log output (and it still didn't boot, obviously). It kept showing the dell logo from prior to the boot selection screen, so I don't know if I'm missing some extra flags that would show logs properly. The same is happening on my Dell i15-7567-A30P. I tried running Fedora and Ubuntu from a LiveUSB and it didn't work either. The only thing that makes the system boot is disabling PTT. Kernel 5.2.8 was the last version that worked fine. Fedora 31 LiveUSB version 1-9 exhibits same behaviour on Dell Inspiron 5567 - if PTT enabled, blank screen - if PTT disabled, normal boot. Another data point (still failing)... reinstalled Fedora, version 31, did all dnf upgrades... still fails with kernel 5.4.18-200 and grub 2.02-105 on Dell Inspiron 5567 if PTT is enabled. Exact same failure. If PTT disabled, normal boot. For those of you who have tried to boot the non-booting kernels with "efi=debug earlyprintk=efi,keep" on the kernel commandline. I have since learned that those are not the correct options for recent kernel to get early boot debugging messages. The current options are: efi=debug earlycon=efifb keep_bootcon Can someone who is seeing this try upgrading grub to a known not working version and then booting a 5.3 or newer kernel with this added to the kernel commandline. Hopefully this will show some output. If it shows some output please take a picture or write down the output and report it here. Just tried it as Hans requests... no change in output. The only output produced is: EFI stub: UEFI Secure Boot is enabled. (In reply to Andy from comment #43) > Just tried it as Hans requests... no change in output. > > The only output produced is: > > EFI stub: UEFI Secure Boot is enabled. That is quite unfortunate (no output), thank you for trying. I tried that as well and still no results. Just the same black screen and no output. If I hit the power button, the system shuts down immediately. *** Bug 1779385 has been marked as a duplicate of this bug. *** There have been some changes to kernel 5.7-rc2 which might help. Can someone who is seeing this issue please try this kernel: https://koji.fedoraproject.org/koji/buildinfo?buildID=1497202 See here for generic instructions for installing a kernel directly from koji: https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt *** Bug 1829172 has been marked as a duplicate of this bug. *** Ping? Can anyone who is seeing this issue on their system please keep 5.7-rc2 a try, or even better since it is out now make that 5.7-rc4: https://koji.fedoraproject.org/koji/buildinfo?buildID=1503100 See here for generic instructions for installing a kernel directly from koji: https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt The test instructions say these test kernels are not signed, so secure boot needs to be disabled. This bug only shows up if secure boot is enabled. Or is this particular test kernel signed? (In reply to Carl Byington from comment #50) > The test instructions say these test kernels are not signed, so secure boot > needs to be disabled. This bug only shows up if secure boot is enabled. Or > is this particular test kernel signed? These are official kernel builds for rawhide, which are signed, so you can keep secure-boot enabled. But what makes you say that this only happens with secure-boot enabled? I see no comment(s) about that anywhere in this bug. (In reply to Hans de Goede from comment #49) > Ping? Can anyone who is seeing this issue on their system please keep > 5.7-rc2 a try, or even better since it is out now make that 5.7-rc4: > https://koji.fedoraproject.org/koji/buildinfo?buildID=1503100 > I still get the issue with this kernel when the TPM (ie PTT) is enabled, same as 5.6.11-200.fc31.x86_64. Note that in my previous tests with 5.3/5.4 it was a reset, now it's a freeze (but other people had it). Btw, this kernel took forever to install : it launched "weak-modules", which created 125 symlinks in /lib/modules/5.7.0-0.rc4.1.fc33.x86_64/weak-updates/ to the 5.6.11 modules, running depmod over and over. It seems to be https://bugzilla.redhat.com/show_bug.cgi?id=1828455, and will hit anyone with F31/F32 kernel-modules-extra packages. (In reply to Loïc Yhuel from comment #52) > I still get the issue with this kernel when the TPM (ie PTT) is enabled, > same as 5.6.11-200.fc31.x86_64. Bummer, thank you for trying. Created attachment 1686665 [details]
tpm: check event log version before reading final events
The issue happens in efi_retrieve_tpm2_eventlog (efi stub) when parsing the final events table.
Since I suspected something linked to the final events, I bypassed this first read (forcing final_events_table = 0).
Then it happens again in tpm2_calc_event_log_size, but here I can add traces to the code, and get them with "earlycon=efifb keep_bootcon", or by returning early to get a successful boot.
I have final_tbl->version = 1, and final_tbl->nr_events = 31.
Then __calc_tpm2_event_size reads bad values for event->count and efispecid->num_algs, both in the hundred of millions, which probably makes it loop enough it appears frozen.
log_tbl->version is EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2, so I don't know if passing the first entry (log_location / log_tbl->log) to tpm2_calc_event_log_size is correct.
That could explain the bad efispecid->num_algs, if the cast was incorrect.
But the bad event->count suggests the final events table is either bad, or not the expected format.
I see the char driver (drivers/char/tpm/eventlog/efi.c) skips the final log if "tpm_log_version != EFI_TCG2_EVENT_LOG_FORMAT_TCG_2".
I attached a patch which does the same for efi_retrieve_tpm2_eventlog and tpm2_calc_event_log_size, but I don't know if this is the correct fix or not.
(In reply to Loïc Yhuel from comment #54) > Created attachment 1686665 [details] > tpm: check event log version before reading final events > > The issue happens in efi_retrieve_tpm2_eventlog (efi stub) when parsing the > final events table. > Since I suspected something linked to the final events, I bypassed this > first read (forcing final_events_table = 0). > Then it happens again in tpm2_calc_event_log_size, but here I can add traces > to the code, and get them with "earlycon=efifb keep_bootcon", or by > returning early to get a successful boot. > > I have final_tbl->version = 1, and final_tbl->nr_events = 31. > Then __calc_tpm2_event_size reads bad values for event->count and > efispecid->num_algs, both in the hundred of millions, which probably makes > it loop enough it appears frozen. > This is great, thanks a lot for finally figuring out the mystery! > log_tbl->version is EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2, so I don't know if > passing the first entry (log_location / log_tbl->log) to > tpm2_calc_event_log_size is correct. > That could explain the bad efispecid->num_algs, if the cast was incorrect. > But the bad event->count suggests the final events table is either bad, or > not the expected format. > Indeed. The TCG EFI Protocol Specification [0] mentions that the EFI Final Events Table (EFI_TCG2_FINAL_EVENTS_TABLE) will always contain log entries using the crypto agile format (EFI_TCG2_EVENT_LOG_FORMAT_TCG_2) and not the SHA-1 format (EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2). If log_tbl->version is EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2, that means the EFI_TCG2_PROTOCOL.GetEventLog() call for EFI_TCG2_EVENT_LOG_FORMAT_TCG_2 either failed or didn't return any entries. > > I see the char driver (drivers/char/tpm/eventlog/efi.c) skips the final log > if "tpm_log_version != EFI_TCG2_EVENT_LOG_FORMAT_TCG_2". > I attached a patch which does the same for efi_retrieve_tpm2_eventlog and > tpm2_calc_event_log_size, but I don't know if this is the correct fix or not. Yes, drivers/char/tpm/eventlog/efi.c skips the Final Events Log for EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 since as mentioned that's not supported according to the TCG spec. I think this is a firmware bug because it seems there's an EFI Final Events Table even when there seems to not be event logs for EFI_TCG2_EVENT_LOG_FORMAT_TCG_2 (or at least can't be retrieved by the GetEventLog() EFI service). So I agree with your patch and that the EFI stub shouldn't even attempt to get a Final Events Table if the Event Log only has EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 entries. But I think the change in drivers/firmware/efi/tpm.c ins't necessary. Since after your change to skip for EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 in the EFI stub, the .tpm_final_log will have its initial value that's EFI_INVALID_TABLE_ADDR. Please post your patch to the linux-integrity mailing list. [0]: https://trustedcomputinggroup.org/wp-content/uploads/EFI-Protocol-Specification-rev13-160330final.pdf *** Bug 1833148 has been marked as a duplicate of this bug. *** (In reply to Javier Martinez Canillas from comment #55) > I think this is a firmware bug because it seems there's an EFI Final Events > Table even when there seems to not be event logs for > EFI_TCG2_EVENT_LOG_FORMAT_TCG_2 (or at least can't be retrieved by the > GetEventLog() EFI service). I checked the first entry in the event log : it is EV_S_CRTM_CONTENTS, so not the EV_NO_ACTION which would contain the tcg_efi_specid_event_head needed for __calc_tpm2_event_size. Perhaps the final events table is using the old format here, but that would be out of spec. > But I think the change in drivers/firmware/efi/tpm.c ins't necessary. Since > after your change to skip for EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 in the EFI > stub, the .tpm_final_log will have its initial value that's > EFI_INVALID_TABLE_ADDR. I still get the TPMFinalLog=0x78f8f000 log, so I think efi.tpm_final_log is set in drivers/firmware/efi/efi.c regardless of what happened in the efi stub. > Please post your patch to the linux-integrity mailing list. done FEDORA-2020-4336d63533 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4336d63533 FEDORA-2020-c6b9fff7f8 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-c6b9fff7f8 FEDORA-2020-5a69decc0c has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5a69decc0c A combination of kernel 5.6.12 and GRUB 2.04-16 works on my Fedora 32, Dell Inspiron 15 7567 machine. Previously any attempt to boot from the GRUB 2.04 EFI binaries resulted in a system freeze. FEDORA-2020-c6b9fff7f8 has been pushed to the Fedora 31 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-c6b9fff7f8` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-c6b9fff7f8 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2020-4336d63533 has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4336d63533` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4336d63533 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2020-5a69decc0c has been pushed to the Fedora 30 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-5a69decc0c` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-5a69decc0c See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. Thank you!! kernel 5.6.12 now boots on my Dell Inspiron 15 7000 with the TPM (PTT) enabled. FEDORA-2020-c6b9fff7f8 has been pushed to the Fedora 31 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2020-4336d63533 has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2020-5a69decc0c has been pushed to the Fedora 30 stable repository. If problem still persists, please make note of it in this bug report. |