Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 2240859
Summary: | amdgpu crash: kernel 6.5.x ([drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Warren Togami <wtogami> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 39 | CC: | acaringi, adscvr, airlied, alciregi, artanzo, awilliam, bskeggs, dsoesman, florian, graeme.w.murray, hdegoede, hpa, jakob, jarod, jforbes, josef, kernel-maint, kparal, lgoncalv, linville, lists, masami256, mchehab, ngompa13, ptalbert, richou672005, robatino, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | AcceptedBlocker | ||
Fixed In Version: | kernel-6.5.6-300.fc39 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-10-09 22:26:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2143446 |
Description
Warren Togami
2023-09-27 00:35:47 UTC
This affects 39 too, and for procedural reasons, I'm shifting it there. Proposed as a Blocker for 39-final by Fedora user ngompa using the blocker tracking app because: This violates the criterion for default application functionality, as usage of preloaded applications using GPU functionality can cause graphical system freezes and crashes, leading to unrecoverable situations. I've tested this scratch build from Justin Forbes: https://koji.fedoraproject.org/koji/taskinfo?taskID=106879719 So far, things have been good and I have not experienced any crashes playing games, video calls, or anything else. Operating System: Fedora Linux 39 KDE Plasma Version: 5.27.8 KDE Frameworks Version: 5.109.0 Qt Version: 5.15.10 Kernel Version: 6.5.5-301.fc39.x86_64 (64-bit) Graphics Platform: Wayland Processors: 8 × AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx Memory: 13.3 GiB of RAM Graphics Processor: AMD Radeon Vega 8 Graphics Manufacturer: BESSTAR TECH LIMITED Product Name: DMAF5 System Version: V1.0 Tested apps: Firefox, Chrome, Discord Tested games: Sonic Origins, and Sonic Adventure 2 affected too, on fedora 38, ryzen 7 57000u, every vulkan game on 6.5.x kernel crashes, 6.6 not tested Unaffected on Fedora 39 with a Ryzen 7 7840U and Radeon 780M. ThinkPad P14s Gen 4 OS: Fedora release 39 (Thirty Nine) x86_64 Kernel: 6.5.5-300.fc39.x86_64 DE: Plasma 5.27.8 WM: kwin CPU: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics (16) @ 5.289GHz GPU: AMD ATI 64:00.0 Phoenix1 Memory: 32 GiB of RAM (including video memory) Accepted as F39 Final blocker in https://pagure.io/fedora-qa/blocker-review/issue/1348 Per https://gitlab.freedesktop.org/drm/amd/-/issues/2830#note_2106507 , a fix has been posted upstream: https://lore.kernel.org/stable/ff4f3163-8d3f-4dae-9bd8-1b1d22bbd61a@amd.com/ It has also been in the Fedora 6.5 tree for a few days, was waiting for 6.5.6 for the build: https://gitlab.com/cki-project/kernel-ark/-/commit/afdab9b20ab7455f752527125b57c92d24601c6e The scratch build Neal linked above has it included. I am seeing the same/similar issue with a "high" timeout, is this the same or shall I open a new one? Okt 04 09:27:04 apollo13 kernel: [drm] ring 0 timeout to preempt ib Okt 04 09:27:14 apollo13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=69159, emitted seq=69161 Okt 04 09:27:14 apollo13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 3036 thread gnome-shel:cs0 pid 3105 Okt 04 09:27:14 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin! Okt 04 09:27:14 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: MODE2 reset Okt 04 09:27:14 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume Okt 04 09:27:14 apollo13 kernel: [drm] PCIE GART of 1024M enabled. Okt 04 09:27:14 apollo13 kernel: [drm] PTB located at 0x000000F43FC00000 Okt 04 09:27:14 apollo13 kernel: [drm] PSP is resuming... Okt 04 09:27:15 apollo13 kernel: [drm] reserve 0x400000 from 0xf43f400000 for PSP TMR Okt 04 09:27:15 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: RAS: optional ras ta ucode is not available Okt 04 09:27:15 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available Okt 04 09:27:15 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Okt 04 09:27:15 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resuming... Okt 04 09:27:15 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully! Okt 04 09:27:15 apollo13 kernel: [drm] DMUB hardware initialized: version=0x01010027 Okt 04 09:27:16 apollo13 kernel: [drm] kiq ring mec 2 pipe 1 q 0 Okt 04 09:27:16 apollo13 kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110) Okt 04 09:27:16 apollo13 kernel: [drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ enable failed Okt 04 09:27:16 apollo13 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110 Okt 04 09:27:16 apollo13 kernel: [drm] Skip scheduling IBs! Okt 04 09:27:16 apollo13 kernel: [drm] Skip scheduling IBs! Okt 04 09:27:16 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(2) failed Okt 04 09:27:16 apollo13 kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset end with ret = -110 Okt 04 09:27:17 apollo13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110 Okt 04 09:27:17 apollo13 firefox.desktop[4167]: amdgpu: amdgpu_cs_query_fence_status failed. Okt 04 09:27:17 apollo13 firefox.desktop[4167]: Crash Annotation GraphicsCriticalError: |[0][GFX1-]: GFX: RenderThread detected a device reset in PostUpdate (t=3350.44) [GFX1-]: GFX: RenderThread detected a device reset in PostUpdate Okt 04 09:27:17 apollo13 gnome-shell[3036]: amdgpu: amdgpu_cs_query_fence_status failed. Okt 04 09:27:17 apollo13 kernel: amdgpu_cs_ioctl: 46 callbacks suppressed Okt 04 09:27:17 apollo13 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Hum, good question. I guess the easiest way to test would be to try the scratch build Neal linked. If that works, I guess it *was* the same problem. If not, new bug. the scratch build seems to help. Can't say for sure yet since I installed it after the first crash so I don't know how frequently it would crash, but let's see. *** Bug 2242506 has been marked as a duplicate of this bug. *** FEDORA-2023-830d9ec624 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-830d9ec624 FEDORA-2023-50bd7c9c12 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-50bd7c9c12 FEDORA-2023-c3bb819677 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-c3bb819677 I dropped the association of the F37 and F38 updates with this bug, as this bug is an F39 release blocker, so we do not want it being closed by an F37 or F38 update being pushed stable. Now only the F39 update going stable will close this report. FEDORA-2023-c3bb819677 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-c3bb819677` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-c3bb819677 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2023-c3bb819677 has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report. Swapping mesa-va-drivers-freeworld with mesa-va-drivers fixed this issue. ThinkPad P14s Gen 2 OS: Fedora release 39 (Thirty Nine) x86_64 Kernel: 6.5.6-300.fc39.x86_64 DE: GNOME 45 WM: mutter CPU: AMD Ryzen 5 PRO 5650U GPU: AMD ATI Cezanne Memory: 32 GiB of RAM (including video memory) that would point to https://fosstodon.org/@knurd42@social.linux.pizza/111215664021438216 , a slightly different bug. |