Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1825046

Summary: NVIDIA Turing GPU "secure boot" is broken and can lead to full system lockups
Product: [Fedora] Fedora Reporter: Ben Skeggs <bskeggs>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 32CC: airlied, awilliam, bcotton, bskeggs, fedoraproject, gmarr, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, robatino, sgallagh, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: RejectedBlocker AcceptedFreezeException
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-25 17:09:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1705306    

Description Ben Skeggs 2020-04-16 23:01:51 UTC
Due to a few missing MODULE_FIRMWARE() lines in the Nouveau DRM driver causing the SEC2 RTOS to be missing from initramfs, the high secure firmware binaries will fail to properly initialise the GPU, leaving it in an odd state which can lead to full system hangs.

I've received two systems from Lenovo (Thinkpad P1, and P53) that are considered very important to be working correctly with Fedora 32, and the issue manifests most easily as suspend/resume failing in Discrete GPU mode.

Other, more severe, failure methods are possible due to the undefined state of the GPU after the ASB firmware has failed to load.

The patch has already been pulled into the Fedora kernel for the next build, and this bug is to propose the issue as a blocker so the installation image can contain the fix.

Comment 1 Fedora Blocker Bugs Application 2020-04-16 23:05:47 UTC
Proposed as a Blocker for 32-final by Fedora user bskeggs using the blocker tracking app because:

 We have requests from Lenovo to ensure that the Thinkpad P1/P53 are well supported in the F32 release, and this bug severely impacts system stability.

Comment 2 Adam Williamson 2020-04-16 23:15:21 UTC
it is really difficult for us to accept a bug that is not publicly visible as a blocker or FE. Fedora is a public project. Does the description really need to be private? if so, is there at least some sanitized version we can make visible?

the *change to the kernel package itself* is necessarily publicly visible, so I don't see how this is secret...

Comment 3 Fedora Update System 2020-04-17 19:56:51 UTC
FEDORA-2020-bebcd88161 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-bebcd88161

Comment 4 Fedora Update System 2020-04-17 22:07:20 UTC
FEDORA-2020-bebcd88161 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-bebcd88161`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-bebcd88161

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 5 Ben Skeggs 2020-04-20 01:39:27 UTC
(In reply to Adam Williamson from comment #2)
> it is really difficult for us to accept a bug that is not publicly visible
> as a blocker or FE. Fedora is a public project. Does the description really
> need to be private? if so, is there at least some sanitized version we can
> make visible?
> 
> the *change to the kernel package itself* is necessarily publicly visible,
> so I don't see how this is secret...

It doesn't need to be private, I've fixed that now.

Comment 7 Stephen Gallagher 2020-04-20 12:29:19 UTC
While this is a serious bug, Turing-based processors are still relatively new (and expensive!). I don't think we'd be likely to block on such a small subset of hardware at the Go/No-Go meeting, particularly with this bug coming in after we've already slipped once.

I'm +1 FE for this. Given that a fix is ready and we are going to be respinning for another blocker bug anyway, we should get this in.

Comment 8 Ben Cotton 2020-04-20 13:17:14 UTC
+1 FE for sure. I'd entertain an argument as to why it should be a blocker.

Comment 9 Adam Williamson 2020-04-20 15:17:17 UTC
Yeah, same place as Stephen and Ben for me, +1 FE for sure, sceptical on blocker (but as we need to respin it's somewhat academic).

Comment 10 Stephen Gallagher 2020-04-20 15:56:02 UTC
(In reply to Adam Williamson from comment #9)
> Yeah, same place as Stephen and Ben for me, +1 FE for sure, sceptical on
> blocker (but as we need to respin it's somewhat academic).

Well, it's academic as long as the provided fix actually works.

FWIW, I asked FESCo to rule on whether they think it needed to be a special blocker and the result was "no".

Comment 11 Geoffrey Marr 2020-04-20 17:44:50 UTC
Discussed during the 2020-04-20 blocker review meeting: [0]

The decision to classify this bug as a "RejectedBlocker" and an "AcceptedFreezeException" was made as we think the impact here is too narrow to qualify as blocker (it only affects a very new NVIDIA hardware generation), but certainly significant enough to accept as an FE.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-04-20/f32-blocker-review.2020-04-20-16.01.txt

Comment 12 Fedora Program Management 2021-04-29 16:47:58 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Ben Cotton 2021-05-25 17:09:30 UTC
Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.