Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1779611 - System fails to boot after upgrade to grub2-efi-x64-1:2.02-103.fc31.x86_64
Summary: System fails to boot after upgrade to grub2-efi-x64-1:2.02-103.fc31.x86_64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1769063 1779385 1786481 1833148 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-04 11:22 UTC by Tim Cuthbertson
Modified: 2020-05-20 03:48 UTC (History)
36 users (show)

Fixed In Version: grub2-2.02-104.fc31 kernel-5.6.13-200.fc31 kernel-5.6.13-300.fc32 kernel-5.6.13-100.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 03:15:05 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
debug=all grub output (5.30 MB, image/jpeg)
2019-12-05 01:20 UTC, Tim Cuthbertson
no flags Details
tpm: check event log version before reading final events (1.67 KB, patch)
2020-05-09 01:42 UTC, Loïc Yhuel
no flags Details | Diff

Description Tim Cuthbertson 2019-12-04 11:22:52 UTC
Unfortunately I have no idea how do get any diagnostics out, but through trial and elimination I've determined that grub2-efi-x64-1:2.02-103.fc31.x86_64 causes my system to be unbootable. I reach the grub selection screen, but no matter what kernel I select or arguments I pass, once I press return I get a blank screen and nothing more.

The version installed with my fedora 31 initial install works, and is:

 - grub2-efi-x64-1:2.02-100.fc31.x86_64

(I haven't tried the `-102` version to see whether it works, as that never made it to stable)

Testing was difficult, but the end result which leads me to this conclusion is:

 - fresh install, /boot/efi contents as of v100, boots fine
 - installed 103 (a handful of grub2-* packages all upgraded together), doesn't boot
 - copied a backup of /boot/efi from v100 back into /boot/efi, it boots. So the problem is definitely with EFI somwehere
 - did an MD5 diff of the contents as of 100/103, saw that only 4 files differ:

v100:
cb22c6afe5df6d1380bf3101d2f22a73  /boot/efi/EFI/fedora/gcdia32.efi
526c6c0d0772ebe2cdbd7555207d75d3  /boot/efi/EFI/fedora/gcdx64.efi
a560f2104e0b964eca0575bdb17fe12b  /boot/efi/EFI/fedora/grubia32.efi
2c577856b90956b12d622fb53554979d  /boot/efi/EFI/fedora/grubx64.efi

v103:
ba1ab018d37181d0e214a78eba5dce17  /boot/efi/EFI/fedora/gcdia32.efi
64be39d2b6c662b34887567c86203fed  /boot/efi/EFI/fedora/gcdx64.efi
e7726d5a602b051ea1b0f17438e8310f  /boot/efi/EFI/fedora/grubia32.efi
d2f131a5a46f2250d63ffbc4f91538a2  /boot/efi/EFI/fedora/grubx64.efi

Assuming it's the x64 one that matters on my system, I copied the old (v100) grubx64.efi over an otherwise v103 /boot contents. That boots.

To ensure it wasn't my copying process that caused issues, I did one more round trip - I copied the v103 grubx64.efi into /boot/efi/EFI/fedora/ and that failed to boot. Then I booted into recory, copied the v100 version of that file and it booted once again.

So, it's that package and specifically that file which is the culprit (for me). I don't know how to track down the actual problem further, advice welcome.

Comment 1 Javier Martinez Canillas 2019-12-04 11:57:22 UTC
By setting debug=all you can get debug output. For that go to a GRUB prompt by pressing the c key when you get into the boot menu and then execute:

grub> set debug=all

And then press Esc to get again into the boot menu.

Alternatively you can set this in your gruenv file from user-space with:

$ grub2-editenv - set debug=all

Can you also check if the entries in the boot menu are correctly generated? You can do this by pressing the e key and check if the kernel, initramfs and command line parameters are correct.

Comment 2 Tim Cuthbertson 2019-12-05 01:20:13 UTC
Created attachment 1642258 [details]
debug=all grub output

Comment 3 Tim Cuthbertson 2019-12-05 01:23:02 UTC
Thanks for the tip. I've attached the end of the output (from debug=all) as a literal screenshot - it gets here and then no further.

Nothing seems alarming about this (or the rest of the output), though I don't know exactly what I'm looking for. I did the same for the version that correctly boots, and couldn't spot anything different. I couldn't quite check the end though since it boots immediately upon reaching EOF.

I checked the paths in the boot entry - for both versions, the paths are the same, and they point to valid paths. So I don't think that's the problem.

Comment 4 Javier Martinez Canillas 2019-12-05 09:29:28 UTC
(In reply to Tim Cuthbertson from comment #3)
> Thanks for the tip. I've attached the end of the output (from debug=all) as
> a literal screenshot - it gets here and then no further.
> 
> Nothing seems alarming about this (or the rest of the output), though I
> don't know exactly what I'm looking for. I did the same for the version that
> correctly boots, and couldn't spot anything different. I couldn't quite
> check the end though since it boots immediately upon reaching EOF.
> 
> I checked the paths in the boot entry - for both versions, the paths are the
> same, and they point to valid paths. So I don't think that's the problem.

Did you check if your kernel command line params was correct? (by pressing 'e' for any entry in the boot menu)

Comment 5 Javier Martinez Canillas 2019-12-05 09:33:51 UTC
(In reply to Tim Cuthbertson from comment #0)

[snip]

> 
> The version installed with my fedora 31 initial install works, and is:
> 
>  - grub2-efi-x64-1:2.02-100.fc31.x86_64
> 
> (I haven't tried the `-102` version to see whether it works, as that never
> made it to stable)
> 
>

Did you try it with version -101 ?

Comment 6 Fedora Update System 2019-12-05 10:13:07 UTC
FEDORA-2019-1635a1541a has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-1635a1541a

Comment 7 Tim Cuthbertson 2019-12-05 11:27:15 UTC
I've just tried with version 102 (by extracting that single .efi file from the downloaded RPM), and that boots correctly. So I assume 101 would as well.

I also see that you published a 104 on bodhi. I extracted the .efi file in the same way, but unfortunately that version does not boot.

Comment 8 Javier Martinez Canillas 2019-12-05 13:18:15 UTC
(In reply to Tim Cuthbertson from comment #7)
> I've just tried with version 102 (by extracting that single .efi file from
> the downloaded RPM), and that boots correctly. So I assume 101 would as well.
> 
> I also see that you published a 104 on bodhi. I extracted the .efi file in
> the same way, but unfortunately that version does not boot.

That's strange. I dropped the only patch that changed anything in GRUB.

The only change between 102 and 104 is a fix for the /etc/grub.d/10_reset_boot_success script (that shouldn't even affect you unless you re-generate your grub.cfg file).

So I wonder if this may be an issue of a change with the toolchain or something with your firmware.

Comment 9 Tim Cuthbertson 2019-12-06 23:38:56 UTC
Well, that's terrifying :/

Do you mean the compiler (gcc?) toolchain, or is it more complicated for grub?

I'd be willing to experiment with building the grubx64.efi file with different versions of GCC, if that would help track down the issue? I am comfortable with building software in docker or nix (as in NixOS) for reproducible build environments, though I assume there's some fedora specific tools to build rpms reliably. Is it easy to vary the GCC version in use with fedora's tools, or does it implicitly depend on my system GCC?

Or would it easier to get koji to do a few builds against different GCC versions and just have me try out the built results?

If it's firmware-related, I assume that couldn't be a _change_ in my firmware right? As in, swapping out different versions of grubx64.efi wouldn't have any effect on my firmware would it?

Comment 10 Fedora Update System 2019-12-07 02:19:36 UTC
grub2-2.02-87.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-69da274284

Comment 11 Fedora Update System 2019-12-07 03:38:51 UTC
grub2-2.02-104.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-1635a1541a

Comment 12 Tim Cuthbertson 2019-12-08 10:07:22 UTC
(for the record, I already tried out the above version with no success)

If it is a toolchain change, I assume that would have to be something in the diff of the buildroots used in the respective builds?

I checked out the x86_64 build `root.log` from koji's 102 and 103 builds, respectively:

https://kojipkgs.fedoraproject.org//packages/grub2/2.02/102.fc31/data/logs/x86_64/root.log
https://kojipkgs.fedoraproject.org//packages/grub2/2.02/103.fc31/data/logs/x86_64/root.log

That has a package installation list, and `diff` shows that the only differing package installed is `python-pip-wheel`, which changed from 19.1.1-5.fc31 to 19.1.1-6.fc31. That seems super unlikely to be the culprit. Is there any other possible difference between the builds?

Comment 13 Fedora Update System 2019-12-09 03:01:27 UTC
grub2-2.02-104.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 14 Tim Cuthbertson 2019-12-09 03:56:59 UTC
reopening

Comment 15 Javier Martinez Canillas 2019-12-11 08:26:11 UTC
(In reply to Tim Cuthbertson from comment #14)
> reopening

Thanks, this bug is mentioned in the update and that's why the Fedora Update System though that was fixed.

I'm very puzzled by your issue. Answering your question about building different grub2 versions, you could use the following commands:

$ fedpkg clone --anonymous grub2 && pushd grub2
$ fedpkg switch-branch f31
$ fedpkg mockbuild
$ rpm2cpio results_grub2/2.02/104.fc31/grub2-efi-x64-2.02-104.fc31.x86_64.rpm | cpio -idmv
$ cp boot/efi/EFI/fedora/grubx64.efi /boot/efi/EFI/fedora/

Comment 16 Tim Cuthbertson 2019-12-11 23:34:33 UTC
Well, this just makes things more puzzling :/

Thanks for your instructions, I've built the latest grub f31 locally (corresponding to the 104 patchset). It boots fine. I built the 102 and 103 patchsets too, and also both boot fine. So there's nothing wrong with the patches themselves. And to confirm my sanity, I redownloaded the .efi files from the published 103 and 104 RPMs, and checked their both of those (still) fail to boot.

In a separate experiment, since I'm pretty handy with nix, I set up nix expressions to duplicate as much as I could inside a reproducible nix build. That is, taking the nixpkgs grub2 expression (https://github.com/NixOS/nixpkgs/blob/8af07181d05e9aff10558847f92bdb9aea18d322/pkgs/tools/misc/grub/2.0x.nix), and modifying it to be based on upstream 2.02, and to include all the patches from the srpm.

I built a nix version of the 102 and 103 patchsets, but unfortunately neither booted. Surprisingly a version based on `master` (of https://src.fedoraproject.org/rpms/grub2.git, which corresponds to rawhide I guess?) does boot. This could definitely be a red herring, there are probably plenty of ways the fedpkg and nix builds differ, and this could be due to some bug which would never manifest in a fedpkg build. But it _is_ at least reproducible. I'm guessing the main difference is that it's been rebased on grub 2.04.

So for one last check, I downloaded f32 versions -5 and -6 (corresponding to f31 -103 and -104, I assume). These both booted successfully.

So to summarize:
 - official fedora builds of 102: SUCCESS
 - official fedora builds of 103, 104: FAILED
 - local (fedpkg) builds of 102, 103 and 104: SUCCESS
 - nix builds of 102, 103: FAILED
 - nix build of HEAD (rawhide?): SUCCESS
 - official fedora f32 versions 5 and 6: SUCCESS

So it seems like it's time to give up, and that my best strategy is to just not update grub, and hope that when f32 rolls around the grub 2.04 versions still work for my machine and I can resume updates.

Thanks for all your help, feel free to re-apply the 103 patchset since that clearly wasn't the actual issue.

Comment 17 Javier Martinez Canillas 2019-12-12 15:32:23 UTC
Thanks a lot for doing all the testing. You could try to compare the binaries from the official -103 and your own -103 build to check if there's any difference that could explain your boot issue.

Comment 18 Hans de Goede 2019-12-19 08:52:47 UTC
Interesting we also have bug 1769063 open about 5.3 kernels not booting on a Dell Inspiron 5567. The debugging done there also points to grub in a way. It seems that any grub version + a 5.2 kernel works fine, where as a 5.3 kernel will only boot with the F31 gold/release grub version. After installing F31 the system boots fine, then after a "dnf update" the system no longer boots unless a 5.2 kernel is selected.

Tim, can you try installing a 5.2 kernel, say: https://koji.fedoraproject.org/koji/buildinfo?buildID=1393085  see: https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt or some instructions for how to install a kernel directly from koji. I wonder if that kernel does boot with one of the "FAIL" grub versions.

Comment 19 Tim Cuthbertson 2019-12-19 10:44:57 UTC
Interesting. I tried that with the -104 version of grub. Which previously failed, but does succeed when booting the kernel 5.2.18 version installed from that link. So yes, this looks like the same issue. I too am on a Dell Inspiron:

pew
    description: Desktop Computer
    product: Inspiron 3668 (0763)
    vendor: Dell Inc.
    serial: H7RXFL2
    width: 64 bits
    capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32
    configuration: boot=normal chassis=desktop family=Inspiron sku=0763 uuid=44454C4C-3700-1052-8058-C8C04F464C32
  *-core
       description: Motherboard
       product: 07KY25
       vendor: Dell Inc.
       physical id: 0
       version: A00
       serial: /H7RXFL2/CNWS20077Q007B/
     *-firmware
          description: BIOS
          vendor: Dell Inc.
          physical id: 0
          version: 1.3.4
          date: 06/20/2017
          size: 64KiB
          capacity: 16MiB
          capabilities: pci pnp upgrade shadowing cdboot bootselect edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification netboot uefi
     *-memory
          description: System Memory
          physical id: 9
          slot: System board or motherboard
          size: 16GiB


Javier, I forgot to reply to your last message: is there some way to compare the resulting binaries? (`diff` doesn't do anything useful). The obvious thing I notice is that the nix-built versions are significantly smaller, which may mean they're simply lacking some optional feature, or maybe debuginfo? The locally-built ones are a consistent size which is a few hundred bytes bigger than the koji-built variants, so diffing those might be more productive.

-rwx------. 1 tim tim 2271560 Oct 10 13:26 100.efi*
-rwx------. 1 tim tim 2271560 Oct 16 16:50 101.efi*
-rwx------. 1 tim tim 2271560 Nov 26 15:53 102.efi*
-rwx------. 1 tim tim 2271560 Nov 27 20:46 103.efi*
-rwx------. 1 tim tim 2271560 Dec  5 20:59 104.efi*
-rwx------. 1 tim tim 2500936 Dec 12 10:16 fc32-5.efi*
-rwx------. 1 tim tim 2500936 Dec 12 10:17 fc32-6.efi*
-rwx------. 1 tim tim 2271992 Dec 12 08:56 local-102.efi*
-rwx------. 1 tim tim 2271992 Dec 12 09:39 local-103.efi*
-rwx------. 1 tim tim 2271992 Dec 12 08:57 local-104.efi*
-rwx------. 1 tim tim 1449984 Dec  9 21:10 nix-102.efi*
-rwx------. 1 tim tim 1449984 Dec  9 21:17 nix-103.efi*
-rwx------. 1 tim tim 1548288 Dec  8 17:41 nix-HEAD.efi*

Comment 20 Tim Cuthbertson 2019-12-19 10:48:14 UTC
(I can upload them somewhere if you want to take a poke yourself...)

Comment 21 Shantanu Goel 2019-12-25 13:15:25 UTC
I have see the same issue on my Dell Inspiron 3668 desktop machine running Fedora 30.  No 5.3 Fedora kernel including the latest 5.3.16 will boot and ends up with a blank screen when selected from the grub menu but kernel 5.2.18 boots fine.  Here is the relevant machine and package info

BIOS Information
        Vendor: Dell Inc.
        Version: 1.7.0
        Release Date: 01/25/2018
System Information
        Manufacturer: Dell Inc.
        Product Name: Inspiron 3668

grub2/kernel package info:

grub2-common-2.02-84.fc30.noarch
grub2-efi-x64-2.02-84.fc30.x86_64
grub2-pc-2.02-84.fc30.x86_64
grub2-pc-modules-2.02-84.fc30.noarch
grub2-tools-2.02-84.fc30.x86_64
grub2-tools-efi-2.02-84.fc30.x86_64
grub2-tools-extra-2.02-84.fc30.x86_64
grub2-tools-minimal-2.02-84.fc30.x86_64
kernel-5.2.18-200.fc30.x86_64
kernel-5.3.11-200.fc30.x86_64
kernel-5.3.16-200.fc30.x86_64
kernel-core-5.2.18-200.fc30.x86_64
kernel-core-5.3.11-200.fc30.x86_64
kernel-core-5.3.16-200.fc30.x86_64
kernel-debug-devel-5.2.18-200.fc30.x86_64
kernel-debug-devel-5.3.11-200.fc30.x86_64
kernel-debug-devel-5.3.16-200.fc30.x86_64
kernel-devel-5.2.18-200.fc30.x86_64
kernel-devel-5.3.11-200.fc30.x86_64
kernel-devel-5.3.16-200.fc30.x86_64
kernel-headers-5.3.11-200.fc30.x86_64
kernel-modules-5.2.18-200.fc30.x86_64
kernel-modules-5.3.11-200.fc30.x86_64
kernel-modules-5.3.16-200.fc30.x86_64
kernel-modules-extra-5.2.18-200.fc30.x86_64
kernel-modules-extra-5.3.11-200.fc30.x86_64
kernel-modules-extra-5.3.16-200.fc30.x86_64
kernel-tools-5.3.9-200.fc30.x86_64
kernel-tools-libs-5.3.9-200.fc30.x86_64

Comment 22 Loïc Yhuel 2020-01-01 04:37:15 UTC
Same issue with Dell Inspiron 15 Gaming 7567, except the system resets when trying to boot a kernel (tested with 5.3 and 5.4).

grub2-efi-x64-2.02-100.fc31.x86_64 => OK
grub2-efi-x64-2.02-104.fc31.x86_64 => reset
grubx64.efi of local build of 2.02-104.fc31 using "fedpkg local" => OK

The only differences between the local and the official 2.02-104 grubx64.efi are :
 - Security Directory size (offset 0x12C) : 0x00000AF8 (local) => 0x00000948 (official)
 - The Security Directory itself, starting at 0x22A000
The official version is signed by "Fedora Secure Boot Signer", while the local build uses "Red Hat Test Certificate".

The grubx64.efi is almost the same in official 2.02-100 and 2.02-104 rpm, with the same signing certificate, and the same size.

In my case secure boot is disabled, so the certificate shouldn't matter, shim probably doesn't check it.
The fact that both official binaries have the same size should exclude any size-related bugs in shim or the UEFI.


Is there a way to check whether the grubx64.efi has been correctly loaded in memory ?

Comment 23 Hans de Goede 2020-01-03 18:04:04 UTC
*** Bug 1769063 has been marked as a duplicate of this bug. ***

Comment 24 Hans de Goede 2020-01-03 18:04:10 UTC
*** Bug 1786481 has been marked as a duplicate of this bug. ***

Comment 25 Loïc Yhuel 2020-01-03 18:56:00 UTC
The issue seems to happen when shim loads the "bad" grub :
 - shim -> local-104 -> Linux => OK
 - shim -> official-104 -> Linux => reset
 - shim -> local-104 -> chainloader official-104 -> Linux => OK
 - shim -> official-104 -> chainloader local-104 -> Linux => reset
 - efi shell -> official-104 -> Linux => OK

So perhaps something being corrupted in memory when shim loads the "bad" grub, which causes an issue when starting the kernel.

Comment 26 dc.hart 2020-01-04 00:01:18 UTC
I have been pondering two other possibilities. I am wondering if this is associated with an SSD. I am also wondering if this has something to do with the MBR which includes - for most people  - a sustained Windows record. The problematic machine is now a secondary laptop. Over the weekend I want to do an sgdisk -Z /dev/sdx and then reinstall to see if that makes a difference. To that end I have created a working Linux system on a stick.

I have no problem with a more current i7-based Dell 5584.

Comment 27 Andy 2020-01-04 12:37:14 UTC
On my Dell 5567 I have the exact issue with NO SDD so I don't think it's that.

I do have a Windows partition as well (which the laptop came with, preinstalled with Windows) resized down before creating the Linux partitions and installing Fedora dual boot - so your second possibility could be the case.

(and clearly the system is in secure boot mode, not legacy)

Comment 28 Rick 2020-01-04 13:32:19 UTC
A very similar issue reported on stackexchange (https://superuser.com/q/1514051/256629) seems to have a temporary remedy by disabling TPM.

Comment 29 Shantanu Goel 2020-01-05 00:09:52 UTC
My situtation with the Dell Inspiron 3668 is exactly similar to yours in that it has an HDD and came with Windows preinstalled.  I shrank the Windows partition to install Linux on the machine.  My machine boots EFI but secure mode is disabled.

(In reply to Andy from comment #27)
> On my Dell 5567 I have the exact issue with NO SDD so I don't think it's
> that.
> 
> I do have a Windows partition as well (which the laptop came with,
> preinstalled with Windows) resized down before creating the Linux partitions
> and installing Fedora dual boot - so your second possibility could be the
> case.
> 
> (and clearly the system is in secure boot mode, not legacy)

Comment 30 Peter Jones 2020-01-08 17:57:15 UTC
Can someone who sees this working with 5.2 but not 5.3+ boot with "efi=debug earlyprintk=efi,keep" and no "rhgb quiet" on the kernel command line, and try to get some photos of the boot process?  This will be incredibly slow and tedious, but has a chance of showing us some useful debug information.

Comment 31 Loïc Yhuel 2020-01-08 19:56:48 UTC
(In reply to Peter Jones from comment #30)
> Can someone who sees this working with 5.2 but not 5.3+ boot with "efi=debug
> earlyprintk=efi,keep" and no "rhgb quiet" on the kernel command line, and
> try to get some photos of the boot process?  This will be incredibly slow
> and tedious, but has a chance of showing us some useful debug information.

I assume you mean the boot process of 5.3+ ?

With 5.2.0-1.fc31 it boots fine, but the traces are way too fast (and I don't know if there is something on screen which isn't normal dmesg output).
With 5.4.7-200.fc31, it resets without printing anything (or maybe it's too fast, perhaps someone who has a freeze instead of a reboot would see something).

However, I can confirm disabling the TPM works, it's called "PTT" in my UEFI Setup.
With no TPM, I can boot kernel-5.4.7-200.fc31 with the official grub2-2.02-104.fc31.

Here is a "dmesg | grep -i tpm" on 5.4.5-300.fc31.x86_64 booted with TPM (and obviously the local grub build) :
> efi:  ACPI=0x78649000  ACPI 2.0=0x78649000  SMBIOS=0xf05e0  SMBIOS 3.0=0xf0600  TPMFinalLog=0x78c8f000  ESRT=0x79360598  MPS=0xfcbe0  TPMEventLog=0x68207018 
> [Firmware Bug]: Failed to parse event in TPM Final Events Log
> ACPI: TPM2 0x000000007867A9D8 000034 (v03        Tpm2Tabl 00000001 AMI  00000000)
> tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
> tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
Note that the "[Firmware Bug]: Failed to parse event in TPM Final Events Log" isn't always present, but is there most of the time.

Here is the same with 5.2.0-1.fc31.x86_64, booting from the affected grub :
> efi:  ACPI=0x78649000  ACPI 2.0=0x78649000  SMBIOS=0xf05e0  SMBIOS 3.0=0xf0600  ESRT=0x79360598  MPS=0xfcbe0  TPMEventLog=0x68207018 
> ACPI: TPM2 0x000000007867A9D8 000034 (v03        Tpm2Tabl 00000001 AMI  00000000)
> tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
> tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
There is no TPMFinalLog, I see this was added in 5.3 : "tpm: Reserve the TPM final events table".

So perhaps the TPM Final Events log is incorrect, which might cause an issue when reading it.
But what would the different grub change ? Would they trigger different TPM logs (between the official 2.02-100 and the local 2.02-104 which work, and the official 2.02-104 which has the issue) when they are booted by shim ?
From the 5.2.0 boot, if that helps :
> efi: mem29: [ACPI Memory NVS    |   |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000007867e000-0x0000000078fe0fff] (9MB)

Comment 32 Andy 2020-01-08 20:22:12 UTC
Definitely something to this TPM theory...

I can confirm that my Dell Inspiron 5567 boots 5.3.16-200 correctly when the TPM/PTT is turned off with GRUB2 1:2.02-87.fc30, and silently fails to boot 5.3.x when the TPM/PTT is turned on.

Comment 33 Andy 2020-01-08 20:31:21 UTC
AAAND....

With "efi=debug earlyprintk=efi,keep" added and "rhgb quiet" removed...

The ONLY output produced before the freeze when trying to boot 5.3.16-200 is:

EFI stub: UEFI Secure Boot is enabled.



Nothing else.

Comment 34 Shantanu Goel 2020-01-11 23:56:31 UTC
I can also confirm that disabling PTT on my Dell Inspiron 3668 desktop allows kernel 5.4.7-100 to boot successfully but leaving it still results in a blank screen as with the earlier 5.3 kernels.

Comment 35 Shantanu Goel 2020-01-12 00:00:29 UTC
(In reply to Shantanu Goel from comment #34)
> I can also confirm that disabling PTT on my Dell Inspiron 3668 desktop
> allows kernel 5.4.7-100 to boot successfully but leaving it still results in
> a blank screen as with the earlier 5.3 kernel

Sorry for the typo but I meant that leaving PTT enabled in the BIOS still results in a blank screen with the 5.4 kernel as it does with 5.3.

Comment 36 Carl Byington 2020-01-13 00:27:28 UTC
I can confirm that disabling PTT on my Dell Inspiron 15 Gaming 7567 laptop allows 5.4.7-100.fc30.x86_64 to boot. Thanks!

Comment 37 Sergey 2020-01-29 19:13:30 UTC
Dell Inspiron 5567:
The 5.3 kernel does not load on any of the distribution kit. LiveCD doesn't even show grub.

Comment 38 Tim Cuthbertson 2020-02-06 10:21:03 UTC
I finally got around to following these instructions myself (running with the v104 version of grubx64.efi from fc31). Unfortunately I didn't have any luck:

Disbling PTT: I pressed "clear" and rebooted, I assume this is what people mean by "disable". It still didn't boot.

Replacing "rhgb quiet" with "efi=debug earlyprintk=efi,keep": I didn't see any log output (and it still didn't boot, obviously). It kept showing the dell logo from prior to the boot selection screen, so I don't know if I'm missing some extra flags that would show logs properly.

Comment 39 Rafael 2020-02-06 23:08:20 UTC
The same is happening on my Dell i15-7567-A30P.

I tried running Fedora and Ubuntu from a LiveUSB and it didn't work either. The only thing that makes the system boot is disabling PTT. 

Kernel 5.2.8 was the last version that worked fine.

Comment 40 Andy 2020-02-13 16:59:10 UTC
Fedora 31 LiveUSB version 1-9 exhibits same behaviour on Dell Inspiron 5567 - if PTT enabled, blank screen - if PTT disabled, normal boot.

Comment 41 Andy 2020-02-14 20:15:11 UTC
Another data point (still failing)... reinstalled Fedora, version 31, did all dnf upgrades... still fails with kernel 5.4.18-200 and grub 2.02-105 on Dell Inspiron 5567 if PTT is enabled.   Exact same failure.   If PTT disabled, normal boot.

Comment 42 Hans de Goede 2020-03-11 17:04:12 UTC
For those of you who have tried to boot the non-booting kernels with "efi=debug earlyprintk=efi,keep" on the kernel commandline. I have since learned that those are not the correct options for recent kernel to get early boot debugging messages. The current options are:

efi=debug earlycon=efifb keep_bootcon

Can someone who is seeing this try upgrading grub to a known not working version and then booting a 5.3 or newer kernel with this added to the kernel commandline. Hopefully this will show some output. If it shows some output please take a picture or write down the output and report it here.

Comment 43 Andy 2020-03-11 17:17:30 UTC
Just tried it as Hans requests... no change in output.

The only output produced is:

EFI stub: UEFI Secure Boot is enabled.

Comment 44 Hans de Goede 2020-03-11 17:20:19 UTC
(In reply to Andy from comment #43)
> Just tried it as Hans requests... no change in output.
> 
> The only output produced is:
> 
> EFI stub: UEFI Secure Boot is enabled.

That is quite unfortunate (no output), thank you for trying.

Comment 45 Rafael 2020-03-13 20:59:06 UTC
I tried that as well and still no results. Just the same black screen and no output. If I hit the power button, the system shuts down immediately.

Comment 46 Hans de Goede 2020-03-14 11:10:50 UTC
*** Bug 1779385 has been marked as a duplicate of this bug. ***

Comment 47 Hans de Goede 2020-04-26 11:15:49 UTC
There have been some changes to kernel 5.7-rc2 which might help.

Can someone who is seeing this issue please try this kernel:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1497202

See here for generic instructions for installing a kernel directly from koji:
https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

Comment 48 Javier Martinez Canillas 2020-05-04 10:28:09 UTC
*** Bug 1829172 has been marked as a duplicate of this bug. ***

Comment 49 Hans de Goede 2020-05-06 15:05:42 UTC
Ping? Can anyone who is seeing this issue on their system please keep 5.7-rc2 a try, or even better since it is out now make that 5.7-rc4:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1503100

See here for generic instructions for installing a kernel directly from koji:
https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

Comment 50 Carl Byington 2020-05-07 23:58:52 UTC
The test instructions say these test kernels are not signed, so secure boot needs to be disabled. This bug only shows up if secure boot is enabled. Or is this particular test kernel signed?

Comment 51 Hans de Goede 2020-05-08 07:25:17 UTC
(In reply to Carl Byington from comment #50)
> The test instructions say these test kernels are not signed, so secure boot
> needs to be disabled. This bug only shows up if secure boot is enabled. Or
> is this particular test kernel signed?

These are official kernel builds for rawhide, which are signed, so you can keep secure-boot enabled.

But what makes you say that this only happens with secure-boot enabled? I see no comment(s) about that anywhere in this bug.

Comment 52 Loïc Yhuel 2020-05-08 15:49:59 UTC
(In reply to Hans de Goede from comment #49)
> Ping? Can anyone who is seeing this issue on their system please keep
> 5.7-rc2 a try, or even better since it is out now make that 5.7-rc4:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1503100
> 
I still get the issue with this kernel when the TPM (ie PTT) is enabled, same as 5.6.11-200.fc31.x86_64.
Note that in my previous tests with 5.3/5.4 it was a reset, now it's a freeze (but other people had it).

Btw, this kernel took forever to install : it launched "weak-modules", which created 125 symlinks in /lib/modules/5.7.0-0.rc4.1.fc33.x86_64/weak-updates/ to the 5.6.11 modules, running depmod over and over.
It seems to be https://bugzilla.redhat.com/show_bug.cgi?id=1828455, and will hit anyone with F31/F32 kernel-modules-extra packages.

Comment 53 Hans de Goede 2020-05-08 19:11:02 UTC
(In reply to Loïc Yhuel from comment #52)
> I still get the issue with this kernel when the TPM (ie PTT) is enabled,
> same as 5.6.11-200.fc31.x86_64.

Bummer, thank you for trying.

Comment 54 Loïc Yhuel 2020-05-09 01:42:27 UTC
Created attachment 1686665 [details]
tpm: check event log version before reading final events

The issue happens in efi_retrieve_tpm2_eventlog (efi stub) when parsing the final events table.
Since I suspected something linked to the final events, I bypassed this first read (forcing final_events_table = 0).
Then it happens again in tpm2_calc_event_log_size, but here I can add traces to the code, and get them with "earlycon=efifb keep_bootcon", or by returning early to get a successful boot.

I have final_tbl->version = 1, and final_tbl->nr_events = 31.
Then __calc_tpm2_event_size reads bad values for event->count and efispecid->num_algs, both in the hundred of millions, which probably makes it loop enough it appears frozen.

log_tbl->version is EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2, so I don't know if passing the first entry (log_location / log_tbl->log) to tpm2_calc_event_log_size is correct.
That could explain the bad efispecid->num_algs, if the cast was incorrect.
But the bad event->count suggests the final events table is either bad, or not the expected format.


I see the char driver (drivers/char/tpm/eventlog/efi.c) skips the final log if "tpm_log_version != EFI_TCG2_EVENT_LOG_FORMAT_TCG_2".
I attached a patch which does the same for efi_retrieve_tpm2_eventlog and tpm2_calc_event_log_size, but I don't know if this is the correct fix or not.

Comment 55 Javier Martinez Canillas 2020-05-11 12:06:01 UTC
(In reply to Loïc Yhuel from comment #54)
> Created attachment 1686665 [details]
> tpm: check event log version before reading final events
> 
> The issue happens in efi_retrieve_tpm2_eventlog (efi stub) when parsing the
> final events table.
> Since I suspected something linked to the final events, I bypassed this
> first read (forcing final_events_table = 0).
> Then it happens again in tpm2_calc_event_log_size, but here I can add traces
> to the code, and get them with "earlycon=efifb keep_bootcon", or by
> returning early to get a successful boot.
> 
> I have final_tbl->version = 1, and final_tbl->nr_events = 31.
> Then __calc_tpm2_event_size reads bad values for event->count and
> efispecid->num_algs, both in the hundred of millions, which probably makes
> it loop enough it appears frozen.
>

This is great, thanks a lot for finally figuring out the mystery!
 
> log_tbl->version is EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2, so I don't know if
> passing the first entry (log_location / log_tbl->log) to
> tpm2_calc_event_log_size is correct.
> That could explain the bad efispecid->num_algs, if the cast was incorrect.
> But the bad event->count suggests the final events table is either bad, or
> not the expected format.
> 

Indeed. The TCG EFI Protocol Specification [0] mentions that the EFI Final Events Table (EFI_TCG2_FINAL_EVENTS_TABLE) will always contain log entries using the crypto agile format (EFI_TCG2_EVENT_LOG_FORMAT_TCG_2) and not the SHA-1 format (EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2).

If log_tbl->version is EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2, that means the EFI_TCG2_PROTOCOL.GetEventLog() call for EFI_TCG2_EVENT_LOG_FORMAT_TCG_2 either failed or didn't return any entries.

> 
> I see the char driver (drivers/char/tpm/eventlog/efi.c) skips the final log
> if "tpm_log_version != EFI_TCG2_EVENT_LOG_FORMAT_TCG_2".
> I attached a patch which does the same for efi_retrieve_tpm2_eventlog and
> tpm2_calc_event_log_size, but I don't know if this is the correct fix or not.

Yes, drivers/char/tpm/eventlog/efi.c skips the Final Events Log for EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 since as mentioned that's not supported according to the TCG spec.

I think this is a firmware bug because it seems there's an EFI Final Events Table even when there seems to not be event logs for EFI_TCG2_EVENT_LOG_FORMAT_TCG_2 (or at least can't be retrieved by the GetEventLog() EFI service).

So I agree with your patch and that the EFI stub shouldn't even attempt to get a Final Events Table if the Event Log only has EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 entries.

But I think the change in drivers/firmware/efi/tpm.c ins't necessary. Since after your change to skip for EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 in the EFI stub, the .tpm_final_log will have its initial value that's EFI_INVALID_TABLE_ADDR.

Please post your patch to the linux-integrity mailing list.

[0]: https://trustedcomputinggroup.org/wp-content/uploads/EFI-Protocol-Specification-rev13-160330final.pdf

Comment 56 Javier Martinez Canillas 2020-05-11 12:21:00 UTC
*** Bug 1833148 has been marked as a duplicate of this bug. ***

Comment 57 Loïc Yhuel 2020-05-12 04:08:35 UTC
(In reply to Javier Martinez Canillas from comment #55)
> I think this is a firmware bug because it seems there's an EFI Final Events
> Table even when there seems to not be event logs for
> EFI_TCG2_EVENT_LOG_FORMAT_TCG_2 (or at least can't be retrieved by the
> GetEventLog() EFI service).
I checked the first entry in the event log : it is EV_S_CRTM_CONTENTS, so not the EV_NO_ACTION which would contain the tcg_efi_specid_event_head needed for __calc_tpm2_event_size.
Perhaps the final events table is using the old format here, but that would be out of spec.

> But I think the change in drivers/firmware/efi/tpm.c ins't necessary. Since
> after your change to skip for EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2 in the EFI
> stub, the .tpm_final_log will have its initial value that's
> EFI_INVALID_TABLE_ADDR.
I still get the TPMFinalLog=0x78f8f000 log, so I think efi.tpm_final_log is set in drivers/firmware/efi/efi.c regardless of what happened in the efi stub.

> Please post your patch to the linux-integrity mailing list.
done

Comment 58 Fedora Update System 2020-05-15 14:22:01 UTC
FEDORA-2020-4336d63533 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4336d63533

Comment 59 Fedora Update System 2020-05-15 14:27:09 UTC
FEDORA-2020-c6b9fff7f8 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-c6b9fff7f8

Comment 60 Fedora Update System 2020-05-15 14:27:29 UTC
FEDORA-2020-5a69decc0c has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-5a69decc0c

Comment 61 Amitosh Swain Mahapatra 2020-05-15 16:24:13 UTC
A combination of kernel 5.6.12 and GRUB 2.04-16 works on my Fedora 32, Dell Inspiron 15 7567 machine.  Previously any attempt to boot from the GRUB 2.04 EFI binaries resulted in a system freeze.

Comment 62 Fedora Update System 2020-05-16 04:44:17 UTC
FEDORA-2020-c6b9fff7f8 has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-c6b9fff7f8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-c6b9fff7f8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 63 Fedora Update System 2020-05-16 05:07:00 UTC
FEDORA-2020-4336d63533 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4336d63533`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4336d63533

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 64 Fedora Update System 2020-05-16 05:42:00 UTC
FEDORA-2020-5a69decc0c has been pushed to the Fedora 30 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-5a69decc0c`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-5a69decc0c

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 65 Carl Byington 2020-05-17 15:39:05 UTC
Thank you!! kernel 5.6.12 now boots on my Dell Inspiron 15 7000 with the TPM (PTT) enabled.

Comment 66 Fedora Update System 2020-05-20 03:15:05 UTC
FEDORA-2020-c6b9fff7f8 has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 67 Fedora Update System 2020-05-20 03:20:28 UTC
FEDORA-2020-4336d63533 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 68 Fedora Update System 2020-05-20 03:48:20 UTC
FEDORA-2020-5a69decc0c has been pushed to the Fedora 30 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.