Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 2123998 - Mesa 22.2.0~rc3 is built without support for common video codecs, missing mesa-va-drivers might cause issues
Summary: Mesa 22.2.0~rc3 is built without support for common video codecs, missing mes...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mesa
Version: 37
Hardware: All
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dave Airlie
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
: 2130286 (view as bug list)
Depends On:
Blocks: F37FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2022-09-03 18:52 UTC by Brendan William
Modified: 2023-12-06 07:24 UTC (History)
33 users (show)

Fixed In Version: mesa-22.2.0-7.fc37 gstreamer1-vaapi-1.20.3-3.fc37
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-13 11:44:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
system journal when gdm fails to start (deleted)
2022-10-05 19:36 UTC, Kamil Páral
no flags Details
journal logs (deleted)
2022-10-06 12:48 UTC, Jiri Eischmann
no flags Details
list of packages (deleted)
2022-10-06 12:49 UTC, Jiri Eischmann
no flags Details
backtrace of gnome-shell (deleted)
2022-10-10 16:39 UTC, Jiri Eischmann
no flags Details
journalctl -b output (deleted)
2022-10-14 17:04 UTC, Brian Morrison
no flags Details
rpm -qa | sort output (deleted)
2022-10-14 17:05 UTC, Brian Morrison
no flags Details
Hung gnome-shell gdb output (deleted)
2022-10-14 22:17 UTC, Brian Morrison
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Gitlab GNOME gnome-shell issues 5710 0 None opened GNOME Shell calls into GStreamer 2022-10-27 14:20:36 UTC

Description Brendan William 2022-09-03 18:52:11 UTC
Description of problem:
Mesa 22.2 introduces a build option that allows building with or without support for some patent encumbered codecs such as h264 and h265 for encoding and decoding, and VC-1 for decoding. By default, none of these codecs are enabled, and this is the case with the most recent Mesa 22.2 RC that is currently in Fedora 37.

Prior to this change, support for these codecs was built by default, which is the case with Mesa 22.1.x in Fedora 36. Building without support for these codecs has implications for hardware accelerated encode/decode support via VA-API.

Version-Release number of selected component (if applicable):
mesa-22.2.0~rc3-1.fc37

How reproducible:
Always

Steps to Reproduce:
1. Ensure libva and libva-utils are installed on Fedora 37 with the most recent version of mesa
2. Run vainfo in terminal of choice
3. Notice lack of support for the aforementioned codecs (h264, h265, VC-1)

Actual results:
libva info: VA-API version 1.15.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_15
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.15 (libva 2.15.0)
vainfo: Driver version: Mesa Gallium driver 22.2.0-rc3 for RENOIR (renoir, LLVM 14.0.5, DRM 3.47, 5.19.6-300.fc37.x86_64)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileJPEGBaseline           :	VAEntrypointVLD
      VAProfileVP9Profile0            :	VAEntrypointVLD
      VAProfileVP9Profile2            :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc

Expected results:
libva info: VA-API version 1.15.0
libva info: Trying to open /usr/lib64/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_15
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.15 (libva 2.15.0)
vainfo: Driver version: Mesa Gallium driver 22.2.0-rc3 for RENOIR (renoir, LLVM 14.0.5, DRM 3.47, 5.19.6-300.fc37.x86_64)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            :	VAEntrypointVLD
      VAProfileMPEG2Main              :	VAEntrypointVLD
      VAProfileVC1Simple              :	VAEntrypointVLD
      VAProfileVC1Main                :	VAEntrypointVLD
      VAProfileVC1Advanced            :	VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:	VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:	VAEntrypointEncSlice
      VAProfileH264Main               :	VAEntrypointVLD
      VAProfileH264Main               :	VAEntrypointEncSlice
      VAProfileH264High               :	VAEntrypointVLD
      VAProfileH264High               :	VAEntrypointEncSlice
      VAProfileHEVCMain               :	VAEntrypointVLD
      VAProfileHEVCMain               :	VAEntrypointEncSlice
      VAProfileHEVCMain10             :	VAEntrypointVLD
      VAProfileHEVCMain10             :	VAEntrypointEncSlice
      VAProfileJPEGBaseline           :	VAEntrypointVLD
      VAProfileVP9Profile0            :	VAEntrypointVLD
      VAProfileVP9Profile2            :	VAEntrypointVLD
      VAProfileNone                   :	VAEntrypointVideoProc

Comment 1 Fedora Update System 2022-09-12 17:21:28 UTC
FEDORA-2022-7aafc1efd1 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2022-7aafc1efd1

Comment 2 Fedora Update System 2022-09-12 17:22:45 UTC
FEDORA-2022-d6edd4beb0 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-d6edd4beb0

Comment 3 Fedora Update System 2022-09-12 17:26:11 UTC
FEDORA-2022-7aafc1efd1 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 4 Fedora Update System 2022-09-13 03:11:32 UTC
FEDORA-2022-d6edd4beb0 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-d6edd4beb0`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-d6edd4beb0

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 5 Fedora Update System 2022-09-18 00:17:22 UTC
FEDORA-2022-d6edd4beb0 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 7 František Zatloukal 2022-09-27 17:31:17 UTC
*** Bug 2130286 has been marked as a duplicate of this bug. ***

Comment 8 Nicolas Chauvet (kwizart) 2022-09-28 06:24:15 UTC
Question for mesa maintainers.


In order to simplify the extendibility of having a 3rd party build of mesa with only missing codecs (vaapi backends),
I think it would be wise to remove the existing mesa backend with only mp2 codec enabled.

Thanks in advance for the understanding.

Comment 9 Nicolas Chauvet (kwizart) 2022-09-28 11:14:23 UTC
Or another way would be to put the vaapi-backend into a separate sub-package (so we can later swap with another sub-package without having to build the whole mesa package).

Comment 10 Pete Walter 2022-09-30 20:18:39 UTC
(In reply to Nicolas Chauvet (kwizart) from comment #9)
> Or another way would be to put the vaapi-backend into a separate sub-package
> (so we can later swap with another sub-package without having to build the
> whole mesa package).

Done in https://src.fedoraproject.org/rpms/mesa/c/07e1e0b1628d9c55d3858c4655409768c5c0b5de?branch=f37

Comment 11 Fedora Update System 2022-09-30 20:26:09 UTC
FEDORA-2022-1a1059f24e has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-1a1059f24e

Comment 12 Fedora Update System 2022-10-01 02:13:18 UTC
FEDORA-2022-1a1059f24e has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-1a1059f24e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-1a1059f24e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 13 Luya Tshimbalanga 2022-10-02 20:24:25 UTC
The now obsolete update broke both AMD and Nvidia hardware running nouveau driver due to missing vaapi drives split from mesa package. The suggestion is adding missing dependency.

Comment 14 Fedora Blocker Bugs Application 2022-10-04 15:49:11 UTC
Proposed as a Blocker for 37-final by Fedora user pwalter using the blocker tracking app because:

 https://bodhi.fedoraproject.org/updates/FEDORA-2022-494754fe0f broke mesa for radeon and nouveau users due to being pushed without matching libva build. This needs fixing one way or another before release.

Comment 15 Kamil Páral 2022-10-05 10:59:48 UTC
Luya, Pete, can you please clarify what exactly is currently broken? Does it mean that desktop now doesn't start at all for AMD and nouveau? Or attempt to use VAAPI now crashes? Please clarify, thank you.

Comment 16 Jiri Eischmann 2022-10-05 17:16:59 UTC
After updating to the latest mesa my system failed to start (it got stuck on switching to GDM) because it was missing mesa-va-drivers which didn't get installed automatically. After I installed it the system booted just fine. I'm using AMD RX570.

Comment 17 Kamil Páral 2022-10-05 17:43:08 UTC
@walter.pete @kwizart What happened in https://bodhi.fedoraproject.org/updates/FEDORA-2022-494754fe0f was very unfortunate. F37 now seems broken for all Radeon/Nouveau users, if I understand it correctly. Because the libva update was pulled back, nothing now pulls in mesa-va-drivers, and it seems that having libva installed (that's by default) and missing mesa-va-drivers results in GDM not starting (see comment 16). This is very bad.

If I described the situation correctly, we need an IMMEDIATE fix (sorry for the caps, but I really want to highlight this). With each passing hour, more users get affected. Either we need to revert the mesa-va-drivers split, or we need libva to *require* (not recommend, because it doesn't work without it, and some people don't install recommended packages) mesa-va-drivers. Or some other way how to ensure mesa-va-drivers is installed for everyone by default.

Can we please resolve this in the fastest way possible, and only then discuss the best approach to do this in an rpmfusion-friendly way? Thanks!

Comment 18 Nicolas Chauvet (kwizart) 2022-10-05 18:31:51 UTC
I can only do that in a coordinated manner at it's "own pace" !
See also https://bugzilla.rpmfusion.org/show_bug.cgi?id=6426#c15 (and laters)

Comment 19 Kamil Páral 2022-10-05 19:36:06 UTC
Created attachment 1916287 [details]
system journal when gdm fails to start

I worked with Jiri to retrieve system journal for the failed boot when gdm doesn't start (comment 16). We've had troubles (might be a systemd bug related to timezones), but the log is now attached. Unfortunately I don't see an exact cause of the failure, only this:

říj 05 18:47:52 fedora-workstation gnome-session[1244]: gnome-session-binary[1244]: WARNING: Application 'org.gnome.Shell.desktop' failed to register before timeout
říj 05 18:47:52 fedora-workstation gnome-session-binary[1244]: WARNING: Application 'org.gnome.Shell.desktop' failed to register before timeout
říj 05 18:47:52 fedora-workstation gnome-session-binary[1244]: Unrecoverable failure in required component org.gnome.Shell.desktop

However, installing mesa-va-drivers (no other package updated) immediately fixed the problem.

Comment 20 Dave Airlie 2022-10-06 02:22:42 UTC
I cannot reproduce this, I just installed an f37 system with gnome and libva is installed and mesa-vaapi-drivers isn't and it works fine on my AMD FIJI GPU.

Is there any sign of a gnome-shell core file or anything with a backtrace?

Comment 21 Tomas Popela 2022-10-06 06:41:16 UTC
I can boot fine on a P14s G2 AMD laptop running Silverblue 37 without mesa-va-drivers (Ryzen 7 PRO 5850U and RENOIR GPU)

Comment 22 Kamil Páral 2022-10-06 07:16:16 UTC
OK, I tested booting Fedora-Workstation-Live-x86_64-37-20221005.n.0.iso (which doesn't have mesa-va-drivers present) on my PC with Radeon 580, and it boots fine. I also tested moving away radeonsi_drv_video.so on F36 which is currently installed there, and it also booted OK (and only vainfo complained, no other problem). So this issue is not universal, which is great news, thanks for verification.

At the same time, I don't believe that Jiri simply had a random error which conveniently appeared at the same time we're shuffling VA drivers around, and it magically fixed right after installing mesa-va-drivers. There has to be some connection to it. I talked to Jiri, he tried rebooting several times, with different kernels, disable splashscreen. Nothing helped, until I advised him to boot into runlevel 3 and install that mesa subpackage. Unfortunately, there's nothing in his ABRT or coredumpctl, and the logs don't seem to be helpful. We'll try to dig more into it on his computer. But I'm afraid we have some corner case here which can affect certain users in some unknown cases.

Comment 23 Jiri Eischmann 2022-10-06 12:48:52 UTC
Created attachment 1916479 [details]
journal logs

Comment 24 Jiri Eischmann 2022-10-06 12:49:21 UTC
Created attachment 1916480 [details]
list of packages

Comment 25 Jiri Eischmann 2022-10-06 12:49:38 UTC
I removed the package and the OS failed to boot into the login screen again. It's an upgraded installation from 2019. When I tried to boot the live image of F37 it booted correctly. So it's something that is in my installation and not on the default F37 that triggers the problem. I'm attaching the journal logs from the boot and the list of packages installed. I haven't found any other clues. gnome-shell is among running processes, so it doesn't crash which explains why there is no coredump. When the boot gets stuck, there is only the boot splash screen and no input events (shortcuts etc) make any change. When I boot with the slashscreen off, it switches from the screen with boot messages to a blank screen with a cursor in the left upper corner and gets stuck, again not shortcuts work.

Comment 26 Leon Grünewald 2022-10-06 18:33:49 UTC
Hi, I got the same error as Mr Páral mentioned after updating today. I got an AMD RX 6600 (non XT). Installing mesa-va-drivers seemingly fixed GDM.

Comment 27 Dave Airlie 2022-10-07 05:04:58 UTC
any gnome shell extensions installed on those systems? something that might be using video enc or dec?

Comment 28 Jiri Eischmann 2022-10-07 07:02:22 UTC
The only enabled gnome shell extension on my system is the default Background Logo.

Comment 29 Leon Grünewald 2022-10-07 07:03:57 UTC
Hello Mr. Airlie

Installed are:
trayIconsReloaded
apps-menu.github.com
background-logo
launch-new-instance.github.com
places-menu.github.com
window-list.github.com
gamemode.me
appindicatorsupport.com

Enabled on my user account are:
trayIconsReloaded

Here also a list of all installed packages on my system (sudo dnf list --installed):
https://paste.centos.org/view/a2215cfc

Comment 30 Adam Williamson 2022-10-10 12:37:08 UTC
So...comparing Jiri's logs to mine from a working system, it seems like this is where things get kinda stuck:

říj 06 11:36:22 fedora-workstation gnome-shell[1253]: Using Wayland display name 'wayland-0'

that's the last message from gnome-shell on his system. On mine, it prints a lot of stuff after that:

Oct 10 07:52:19 t16.happyassassin.net gnome-shell[1231]: Using Wayland display name 'wayland-0'
Oct 10 07:52:20 t16.happyassassin.net gnome-shell[1231]: JS WARNING: [resource:///org/gnome/gjs/modules/core/overrides/Gio.js 287]: Too many arguments to method Gio.AsyncInitable.init_async: expected 3, got 4
Oct 10 07:52:20 t16.happyassassin.net gnome-shell[1231]: JS WARNING: [resource:///org/gnome/gjs/modules/core/overrides/Gio.js 287]: Too many arguments to method Gio.AsyncInitable.init_async: expected 3, got 4
Oct 10 07:52:20 t16.happyassassin.net gnome-shell[1231]: Unset XDG_SESSION_ID, getCurrentSessionProxy() called outside a user session. Asking logind directly.
Oct 10 07:52:20 t16.happyassassin.net gnome-shell[1231]: Will monitor session c1

and so on. Looking at where we are when that message gets printed, we're at the end of `meta_wayland_compositor_new` in mutter src/wayland/meta-wayland.c ; after printing that message it sets a few environment vars and returns. It looks like it's called from `meta_context_start` in src/core/meta-context.c, and the next thing that does is:

  priv->display = meta_display_new (context, error);
  if (!priv->display)
    {
      priv->state = META_CONTEXT_STATE_TERMINATED;
      return FALSE;
    }

  priv->main_loop = g_main_loop_new (NULL, FALSE);

  priv->state = META_CONTEXT_STATE_STARTED;

and then return. So I'm gonna guess it's getting stuck there, somehow.

I say Shell gets "stuck" because it doesn't seem to proceed any further, doesn't log anything else, and eventually gnome-session gets tired of waiting for it:

říj 06 11:37:51 fedora-workstation gnome-session[1245]: gnome-session-binary[1245]: WARNING: Application 'org.gnome.Shell.desktop' failed to register before timeout

but Jiri says it doesn't *crash*, the process is still there. So, it seems like it's somehow stuck.

Comment 31 Adam Williamson 2022-10-10 12:55:46 UTC
Jiri: could you possibly get gdm to run with MUTTER_VERBOSE=1 and MUTTER_DEBUG=1 set? It might give us more idea exactly where things are going wrong...

Comment 32 Jiri Eischmann 2022-10-10 16:39:37 UTC
Created attachment 1917099 [details]
backtrace of gnome-shell

Comment 33 Jiri Eischmann 2022-10-10 16:41:03 UTC
I've submitted the backtrace of gnome-shell I got from gdb. I've also tried to set SELinux to permissive, didn't help, also tried to switch to runlevel 3 and back to 5, didn't help either.

Comment 34 Kamil Páral 2022-10-10 16:58:11 UTC
(To be clear, the above is not a backtrace of a crash, but a backtrace of a running gnome-shell process (under gdm) which seems to be stuck and waiting for something).

Comment 35 Ray Strode [halfline] 2022-10-10 17:17:05 UTC
#0  0x00007f41ee922e26 in ppoll () at /lib64/libc.so.6
#1  0x00007f41c7ee5bff in gst_poll_wait () at /lib64/libgstreamer-1.0.so.0
#2  0x00007f41c7eeba1e in exchange_packets () at /lib64/libgstreamer-1.0.so.0
#3  0x00007f41c7eecfbf in plugin_loader_free.lto_priv () at /lib64/libgstreamer-1.0.so.0
#4  0x00007f41c7ef8353 in gst_update_registry () at /lib64/libgstreamer-1.0.so.0
#5  0x00007f41c7e8a48a in init_post () at /lib64/libgstreamer-1.0.so.0
#6  0x00007f41ef947af1 in g_option_context_parse () at /lib64/libglib-2.0.so.0
#7  0x00007f41c7e829ff in gst_init_check () at /lib64/libgstreamer-1.0.so.0

Seems to be trying to rebuild the gdm gstreamer registry and its getting stuck waiting for the rebuild to finish?

Comment 36 Adam Williamson 2022-10-10 17:18:01 UTC
Discussed at the 2022-10-10 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2022-10-10/f37-blocker-review.2022-10-10-16.01.html . It was agreed to accept this as a blocker as a conditional violation of the Basic "graphical systems must boot to the login screen" requirement, at least while we continue to investigate it - we expect to revisit this at Thursday's go/no-go.

Comment 37 Ray Strode [halfline] 2022-10-10 17:37:44 UTC
jiri can you post a backtrace of the hung gst-plugin-scanner process too?

Comment 38 Ray Strode [halfline] 2022-10-10 17:43:49 UTC
just reading through the gstreamer1-vaapi code I see this bit:

  /* If no neighboor, or application not interested, use system default */•
  if (plugin->gl_context) {•
    display = gst_vaapi_create_display_from_gl_context (plugin->gl_context);•
    /* Cannot instantiate VA display based on GL context. Reset the•
     *  requested display type to ANY to try again */•
    if (!display)•
      gst_vaapi_plugin_base_set_display_type (plugin,•
          GST_VAAPI_DISPLAY_TYPE_ANY);•
  }• 

I'm just completely guessing here (not knowing this part of the stack at all),
but it seems at least conceivable to me that the 

gst_vaapi_create_display_from_gl_context (...)

fails if mesa-va-drivers isn't installed, and that retrying with 
GST_VAAPI_DISPLAY_TYPE_ANY makes it try to talk to the wayland socket that gnome-shell isn't managing yet because it's stuck waiting on this process to finish.

anyway, that's one theory...

Comment 39 Ray Strode [halfline] 2022-10-10 18:22:01 UTC
jiri sent me the backtrace i asked for in comment 37 through irc. It shows that gst-plugin-scanner is trying to connect to Xwayland. the full trace is here:

Thread 1 (Thread 0x7f9086d42740 (LWP 1281) "gst-plugin-scan"):
#0  0x00007f9087229cf4 in __GI___poll (fds=fds@entry=0x7ffe9a958a10, nfds=nfds@entry=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
#1  0x00007f908657f583 in poll (__timeout=-1, __nfds=1, __fds=0x7ffe9a958a10) at /usr/include/bits/poll2.h:39
        pfd = {fd = 7, events = 1, revents = 0}
        ret = <optimized out>
        done = 0
        ret = <optimized out>
        done = 0
#2  read_block (len=8, buf=0x55cc604519d0, fd=7) at /usr/src/debug/libxcb-1.13.1-10.fc37.x86_64/src/xcb_in.c:388
        pfd = {fd = 7, events = 1, revents = 0}
        ret = <optimized out>
        done = 0
        ret = <optimized out>
        done = 0
#3  _xcb_in_read_block (c=c@entry=0x55cc60453330, buf=0x55cc604519d0, len=len@entry=8) at /usr/src/debug/libxcb-1.13.1-10.fc37.x86_64/src/xcb_in.c:1075
        ret = <optimized out>
        done = 0
#4  0x00007f9086582bc2 in read_setup (c=0x55cc60453330) at /usr/src/debug/libxcb-1.13.1-10.fc37.x86_64/src/xcb_conn.c:177
        c = 0x55cc60453330
#5  xcb_connect_to_fd (fd=fd@entry=7, auth_info=auth_info@entry=0x7ffe9a958b50) at /usr/src/debug/libxcb-1.13.1-10.fc37.x86_64/src/xcb_conn.c:359
        c = 0x55cc60453330
#6  0x00007f90865834d6 in xcb_connect_to_display_with_auth_info (displayname=displayname@entry=0x0, auth=auth@entry=0x0, screenp=screenp@entry=0x0) at /usr/src/debug/libxcb-1.13.1-10.fc37.x86_64/src/xcb_util.c:532
        fd = <optimized out>
        display = 1024
        host = 0x55cc60451b00 ""
        protocol = 0x0
        ourauth = {namelen = 18, name = 0x55cc60451a60 "MIT-MAGIC-COOKIE-1", datalen = 16, data = 0x55cc60451a40 "\005\346wnj\246@.\370\246\274\352\004\245\375*"}
        c = <optimized out>
        parsed = <optimized out>
#7  0x00007f908658357e in xcb_connect (displayname=displayname@entry=0x0, screenp=screenp@entry=0x0) at /usr/src/debug/libxcb-1.13.1-10.fc37.x86_64/src/xcb_util.c:489
#8  0x00007f908682115a in _XConnectXCB (dpy=0x55cc604520c0, display=0x0, screenp=0x7ffe9a958d6c) at /usr/src/debug/libX11-1.8.1-2.fc37.x86_64/src/xcb_disp.c:78
        host = 0x55cc60451b00 ""
        n = 1024
        c = <optimized out>
#9  0x00007f9086810f87 in XOpenDisplay (display=display@entry=0x0) at /usr/src/debug/libX11-1.8.1-2.fc37.x86_64/src/OpenDis.c:129
        dpy = 0x55cc604520c0
        i = <optimized out>
        j = <optimized out>
        k = <optimized out>
        display_name = 0x7ffe9a95af76 ":1024"
        setup = 0x0
        iscreen = 0
        vendorlen = <optimized out>
        u = {setup = <optimized out>, failure = <optimized out>, vendor = <optimized out>, sf = <optimized out>, rp = <optimized out>, dp = <optimized out>, vp = <optimized out>}
        setuplength = <optimized out>
        usedbytes = 0
        mask = <optimized out>
        conn_buf_size = <optimized out>
        xlib_buffer_size = <optimized out>
#10 0x00007f908696a694 in drm_auth_x11_init (auth=0x7ffe9a958e80) at drm/va_drm_auth_x11.c:114
        vtable = 0x7ffe9a958e88
        libva_x11_name = "libva-x11.so.2\000\300"
        ret = 14
        auth = {handle = 0x55cc60440040, vtable = {x11_open_display = 0x7f9086810eb0 <XOpenDisplay>, x11_close_display = 0x7f9086802e00 <XCloseDisplay>, va_dri2_query_extension = 0x7f9086923d70 <VA_DRI2QueryExtension>, va_dri2_query_version = 0x7f9086923db0 <VA_DRI2QueryVersion>, va_dri2_authenticate = 0x7f90869240b0 <VA_DRI2Authenticate>}, display = 0x0, window = 0}
        success = false
        ctx = <optimized out>
        drm_state = 0x55cc604513d0
        magic = 1
        ret = <optimized out>
#11 va_drm_authenticate_x11 (fd=6, magic=1) at drm/va_drm_auth_x11.c:163
        auth = {handle = 0x55cc60440040, vtable = {x11_open_display = 0x7f9086810eb0 <XOpenDisplay>, x11_close_display = 0x7f9086802e00 <XCloseDisplay>, va_dri2_query_extension = 0x7f9086923d70 <VA_DRI2QueryExtension>, va_dri2_query_version = 0x7f9086923db0 <VA_DRI2QueryVersion>, va_dri2_authenticate = 0x7f90869240b0 <VA_DRI2Authenticate>}, display = 0x0, window = 0}
        success = false
        ctx = <optimized out>
        drm_state = 0x55cc604513d0
        magic = 1
        ret = <optimized out>
#12 va_drm_authenticate (magic=1, fd=6) at drm/va_drm_auth.c:37
        ctx = <optimized out>
        drm_state = 0x55cc604513d0
        magic = 1
        ret = <optimized out>
#13 va_DisplayContextGetNumCandidates (pDisplayContext=<optimized out>, num_candidates=<optimized out>) at drm/va_drm.c:73
        ctx = <optimized out>
        drm_state = 0x55cc604513d0
        magic = 1
        ret = <optimized out>
#14 0x00007f9086aaf1af in va_getDriverNumCandidates (num_candidates=0x7ffe9a958f4c, dpy=0x55cc60451400) at /usr/src/debug/libva-2.15.0-2.fc37.x86_64/va/va.c:357
        pDisplayContext = 0x55cc60451400
        driver_name_env = 0x0
        vaStatus = 0
        ctx = 0x55cc60451570
        driver_name = 0x0
        num_candidates = 1
        candidate_index = 0
        vaStatus = <optimized out>
        __func__ = "vaInitialize"
#15 vaInitialize (dpy=dpy@entry=0x55cc60451400, major_version=major_version@entry=0x7ffe9a958fa4, minor_version=minor_version@entry=0x7ffe9a958fa0) at /usr/src/debug/libva-2.15.0-2.fc37.x86_64/va/va.c:730
        driver_name = 0x0
        num_candidates = 1
        candidate_index = 0
        vaStatus = <optimized out>
        __func__ = "vaInitialize"
#16 0x00007f9086ccec6c in vaapi_initialize (dpy=0x55cc60451400) at ../gst-libs/gst/vaapi/gstvaapiutils.c:113
        major_version = 21964
        minor_version = 1615139840
        status = <optimized out>
        __func__ = "vaapi_initialize"
#17 0x00007f9086cecb34 in supports_vaapi (fd=6) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:77
        ret = <optimized out>
        va_dpy = 0x55cc60451400
        parent = <optimized out>
        priv = 0x55cc6044c120
        devpath = 0x55cc60451090 "/dev/dri/card0"
        e = 0x55cc6044d000
        fd = 6
        syspath = <optimized out>
        udev = 0x55cc6044ed80
        device = 0x55cc6044e300
        l = 0x55cc60410ca0
        i = <optimized out>
        priv = 0x55cc6044c120
        priv = 0x55cc6044c120
#18 get_default_device_path (display=0x55cc6044c1d0) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:140
        parent = <optimized out>
        priv = 0x55cc6044c120
        devpath = 0x55cc60451090 "/dev/dri/card0"
        e = 0x55cc6044d000
        fd = 6
        syspath = <optimized out>
        udev = 0x55cc6044ed80
        device = 0x55cc6044e300
        l = 0x55cc60410ca0
        i = <optimized out>
        priv = 0x55cc6044c120
        priv = 0x55cc6044c120
#19 set_device_path (device_path=<optimized out>, display=0x55cc6044c1d0) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:181
        priv = 0x55cc6044c120
        priv = 0x55cc6044c120
#20 gst_vaapi_display_drm_open_display (display=0x55cc6044c1d0, name=<optimized out>) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:247
        priv = 0x55cc6044c120
#21 0x00007f9086cc3eb8 in gst_vaapi_display_create (data=0x0, init_type=GST_VAAPI_DISPLAY_INIT_FROM_DISPLAY_NAME, display=0x55cc6044c1d0) at ../gst-libs/gst/vaapi/gstvaapidisplay.c:958
        info = {display = 0x55cc6044c1d0, display_name = 0x0, va_display = 0x0, native_display = 0x0}
        priv = 0x55cc6044c140
        klass = 0x55cc6044bbe0
        __func__ = "gst_vaapi_display_config"
#22 gst_vaapi_display_config (display=0x55cc6044c1d0, init_type=GST_VAAPI_DISPLAY_INIT_FROM_DISPLAY_NAME, init_value=0x0) at ../gst-libs/gst/vaapi/gstvaapidisplay.c:1265
        __func__ = "gst_vaapi_display_config"
#23 0x00007f9086cf12c5 in gst_vaapi_display_drm_new (device_path=0x0) at ../gst-libs/gst/vaapi/gstvaapidisplay_drm.c:367
        display = <optimized out>
        types = {2, 1, 2261812962}
        i = 1
        num_types = <optimized out>
        device_paths = {0x0, 0x0, 0x7f908754e7a0 <known_licenses>}
#24 0x00007f9086c892cb in gst_vaapi_create_test_display () at ../gst/vaapi/gstvaapipluginutil.c:929
        i = 0
        display = 0x0
        display = <optimized out>
        decoders = <optimized out>
        rank = <optimized out>
        __func__ = "plugin_init"
#25 plugin_init (plugin=0x55cc60439190) at ../gst/vaapi/gstvaapi.c:191
        display = <optimized out>
        decoders = <optimized out>
        rank = <optimized out>
        __func__ = "plugin_init"
#26 0x00007f90874dd512 in gst_plugin_register_func (plugin=plugin@entry=0x55cc60439190, desc=desc@entry=0x7f9086d3c000 <gst_plugin_desc>, user_data=user_data@entry=0x0) at ../gst/gstplugin.c:532
        __func__ = "gst_plugin_register_func"
#27 0x00007f90874e50f7 in _priv_gst_plugin_load_file_for_registry (filename=filename@entry=0x55cc60424ffc "/usr/lib64/gstreamer-1.0/libgstvaapi.so", registry=<optimized out>, registry@entry=0x0, error=error@entry=0x0) at ../gst/gstplugin.c:971
        desc = 0x7f9086d3c000 <gst_plugin_desc>
        plugin = 0x55cc60439190
        symname = <optimized out>
        module = 0x55cc60426e30
        ret = <optimized out>
        ptr = 0x7f9086d3c000 <gst_plugin_desc>
        file_status = {st_dev = 2053, st_ino = 5244013, st_nlink = 1, st_mode = 33261, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 887528, st_blksize = 4096, st_blocks = 1744, st_atim = {tv_sec = 1665343425, tv_nsec = 730456412}, st_mtim = {tv_sec = 1658410763, tv_nsec = 0}, st_ctim = {tv_sec = 1663168673, tv_nsec = 264324775}, __glibc_reserved = {0, 0, 0}}
        new_plugin = 1
        flags = <optimized out>
        __func__ = "_priv_gst_plugin_load_file_for_registry"
#28 0x00007f90874e590e in gst_plugin_load_file (filename=filename@entry=0x55cc60424ffc "/usr/lib64/gstreamer-1.0/libgstvaapi.so", error=error@entry=0x0) at ../gst/gstplugin.c:689
#29 0x00007f90874e60f6 in do_plugin_load (tag=0, filename=0x55cc60424ffc "/usr/lib64/gstreamer-1.0/libgstvaapi.so", l=0x55cc6041cc30) at ../gst/gstpluginloader.c:845
        newplugin = <optimized out>
        chunks = 0x0
        res = 1
        magic = <optimized out>
        packet_len = <optimized out>
        to_read = <optimized out>
        tag = 0
        in = <optimized out>
        res = <optimized out>
        res = <optimized out>
        __func__ = "exchange_packets"
#30 handle_rx_packet (payload_len=<optimized out>, payload=0x55cc60424ffc "/usr/lib64/gstreamer-1.0/libgstvaapi.so", tag=0, pack_type=<optimized out>, l=0x55cc6041cc30) at ../gst/gstpluginloader.c:953
        res = 1
        magic = <optimized out>
        packet_len = <optimized out>
        to_read = <optimized out>
        tag = 0
        in = <optimized out>
        res = <optimized out>
        res = <optimized out>
        __func__ = "exchange_packets"
#31 read_one (l=0x55cc6041cc30) at ../gst/gstpluginloader.c:1123
        magic = <optimized out>
        packet_len = <optimized out>
        to_read = <optimized out>
        tag = 0
        in = <optimized out>
        res = <optimized out>
        res = <optimized out>
        __func__ = "exchange_packets"
#32 exchange_packets (l=l@entry=0x55cc6041cc30) at ../gst/gstpluginloader.c:1151
        res = <optimized out>
        __func__ = "exchange_packets"
#33 0x00007f90874e7238 in _gst_plugin_loader_client_run () at ../gst/gstpluginloader.c:700
        res = 1
        l = 0x55cc6041cc30
        __func__ = "_gst_plugin_loader_client_run"
#34 0x000055cc5fb061e9 in main (argc=<optimized out>, argv=0x7ffe9a959668) at ../libs/gst/helpers/gst-plugin-scanner.c:67
        res = 1
        my_argv = 0x55cc60414830
        my_argc = 1
Detaching from program: /usr/libexec/gstreamer-1.0/gst-plugin-scanner, process 1281

Comment 40 Ray Strode [halfline] 2022-10-10 18:45:11 UTC
So what's happening I think is, the libva wants to talk to X11 as part of the drm auth protocol for the va api. incidentally, it's missing wayland support:

./va/drm/va_drm_auth.c:    /* XXX: try to authenticate through Wayland, etc. */

but trying to talk to X11 means having to start Xwayland. mutter does that when it detects activity on the X11 socket. It won't detect that activity though if gnome-shell is in the middle of a synchronous call waiting for gst-plugin-scanner to finish. I don't know why the problem goes away when mesa-va-drivers is installed (maybe it's only doing this in a fallback path?), but since it does go away when mesa-va-drivers is installed, I think the easiest fix is to just add a: 

Requires: mesa-va-drivers

to gstreamer1-vaapi and maybe a 

Recommends: mesa-va-drivers

to whatever package it got split off from (mesa-dri-drivers ?)

Comment 41 Ray Strode [halfline] 2022-10-10 18:45:46 UTC
I just want to add Jiri confirmed on IRC removing gstreamer1-vaapi makes the problem go away.

Comment 42 Ray Strode [halfline] 2022-10-10 19:31:36 UTC
> I don't know why the problem goes away when mesa-va-drivers is installed

So just reading more through code, I think what's probably happening is in the working case it's using a drm render node for va which doesn't require the drm auth stuff, but if mesa-va-drivers isn't installed, trying to use the render node probably fails, so it futilely falls back to trying to using the card device and legacy X11 authentication bits. This hangs because, as mentioned before it's waiting for Xwayland to start which won't happen if gnome-shell is blocked in a sync call.

Comment 43 Ray Strode [halfline] 2022-10-10 19:56:01 UTC
I'm doing gstreamer1-vaapi and mesa builds now with my suggestion from comment 40 but the builders seem hung up on s390x (might be related to an outage in the westford lab over the weekend, not sure).

I don't know if it'll eventually complete or i'll have to retry the builds later.

Comment 44 Adam Williamson 2022-10-10 21:09:19 UTC
Thanks for investigating, Ray! We should probably make sure this is in line with the wider issue here, though. If I'm understanding correctly, the point of splitting the drivers out in the first place is to allow that package to be potentially replaced with a different third-party package that might contain things Fedora cannot. Do we know if adding the requirement or recommend will cause any problems with that?

At least, I think we should potentially just do the requires in gstreamer1-vaapi as a minimal fix here and maybe not bundle it with the recommends in mesa, at least until we're sure it's OK in the wider context.

The s390 issue is a general one, there's an outage at the data center where the s390 builders live, AIUI.

Comment 45 Ray Strode [halfline] 2022-10-10 23:40:54 UTC
I actually took a peek at the alternative mesa package here https://www.thefinalzone.net/packages/mesa-freeworld.spec before initiating the builds. note it has:

%package        -n %{srcname}-va-drivers-freeworld•
...
Provides:       %{srcname}-va-drivers = %{?epoch:%{epoch}:}%{version}-%{release}•
 
So the Recommends should cause no issues. I also think the Recommends is a good idea in general because it keeps dependencies close to as they were before the split, so it's the more surgical change.

Comment 46 Adam Williamson 2022-10-11 06:19:01 UTC
I think the potential issue is that it's harder to swap a package than install one where no package currently exists. If we recommend mesa-va-drivers from mesa-dri-drivers so that basically everyone gets the Fedora one installed on fresh install, it makes it somewhat harder to switch to a third party one. GUI apps don't have an equivalent of `dnf swap` (and even CLI users don't always know about `dnf swap`). Still it's less of an issue if it's just a Recommends, I guess, as apps shouldn't refuse to let you remove the Fedora one in that case.

Comment 47 Kamil Páral 2022-10-11 08:41:33 UTC
I think we should have mesa-va-drivers installed by default (at least in a Workstation), so the Recommends in mesa-dri-drivers sounds reasonable to me. First, we used to have it installed by default, so this approach keeps it the same way. Second, mesa-va-drivers contains VP9 and possibly also AV1 (for the latest hardware) decoders, which is quite important these days (YouTube), and Firefox finally ships with hw accelerated decoding enabled by default. It would be sad to regress on that. Third, while inconvenient, the solution with "dnf swap" and similar when using alternative repos is good enough for the moment, I believe (GNOME Software doesn't show included packages out-of-the-box anyway, you need to install AppStream metadata first, which requires using a cmdline). A better approach than overwriting .so files will be needed anyway to make the maintenance easier, and there are some suggestions proposed already [1]. Overall, the proposed approach in comment 40 seems to me to be the best we can do at the moment.

In the long run, IIUIC, we should have a look whether gnome-shell can initialize gstreamer later (once it's fully started up), or whether gstreamer should have some checks for being used while the lower stack is not fully started yet, in order to avoid this deadlock. Should I file some upstream issues for that?

[1] https://github.com/intel/libva/issues/639

Comment 48 Adam Williamson 2022-10-11 09:00:08 UTC
Sure, that's a good argument. I'll go with that. Marking this as MODIFIED since builds are there, just waiting for the s390x outage to be resolved.

Comment 49 Ray Strode [halfline] 2022-10-11 13:28:00 UTC
i don't think we should change gnome-shell. I don't think doing gst_init() later is really a great option. Synchronous calls at start up are generally fine, synchronous calls in the middle of when a machine is getting used are bad juju. Also, delay doesn't buy us much. If the user doesn't install mesa-va-drivers or they don't start another gstreamer app to handle the registry getting built, just delaying could lead to a session lock up when we finally get around to doing it.
We could put `['--gst-disable-registry-update']` instead of null for the args to gst_init() but then that might break screen recording for fresh accounts until they start a gstreamer app to rebuild the registry, so that's not a great option either.

We could run it *earlier*. If there's no DISPLAY set yet then it's not going to try to talk to the X server, so doing it at the top of main() might work. On first blush, that sort of strikes me as a workaround, though. It might be a pragmatic workaround, but i personally don't see the need if we can avoid it.

I think maybe gstreamer1-vaapi could e.g., avoid falling back to trying /dev/dri/card0 (and legacy x drm auth stuff) if /dev/dri/renderD128 already failed and it's associated with the same pci device as /dev/dri/card0. that would allow it to fail more gracefully when run from gnome-shell if mesa-va-drivers aren't installed, I think. But it's kind of a niche case to handle, and also not obviously a better solution to me than "make the gstreamer plugin that needs va drivers require va drivers" solution we've come up with. 

So I don't think filing a gnome-shell issue upstream is right. You could file an upstream gstreamer bug if you want, but i don't think it's strictly necessary, either.

I really think deps are the best way forward here, personally. Of course i kind of got pulled into this from the side...Jiri pinged me for help. If those closer to mesa... pwalter, airlied, etc don't like the mesa deps, and if those closer to gstreamer e.g. wtay would rather a dep less solution, too, i'm not trying to step on toes or anything... we can come up with a code solution if we need to.

Comment 50 Fedora Update System 2022-10-11 13:35:37 UTC
FEDORA-2022-9ee52e6983 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-9ee52e6983

Comment 51 Ray Strode [halfline] 2022-10-11 13:36:55 UTC
The builds finished and this is a blocker so I've started the update process. In the off chance we do get objections, we can reroll if we need to, of course.

Comment 52 Daniel Rusek 2022-10-11 14:36:33 UTC
(In reply to Kamil Páral from comment #47)
A better approach than overwriting
> .so files will be needed anyway to make the maintenance easier, and there
> are some suggestions proposed already [1]. Overall, the proposed approach in
> comment 40 seems to me to be the best we can do at the moment.

Here is also a possible solution proposed by RPM Fusion folks: https://bugzilla.rpmfusion.org/show_bug.cgi?id=6426#c36

Comment 53 Fedora Update System 2022-10-11 15:52:50 UTC
FEDORA-2022-9ee52e6983 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-9ee52e6983`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-9ee52e6983

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 54 Fedora Update System 2022-10-13 11:44:41 UTC
FEDORA-2022-9ee52e6983 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 55 Brian Morrison 2022-10-13 17:47:49 UTC
I have 2 systems that hang before gdm comes up if I install the gstreamer1-vaapi package(s), they have most of the mesa packages, with the mesa-va-driver package(s), installed.

This is happening with mesa-2.21.1-1 and gstreamer1-vaapi-1.20.3-3

Comment 56 Kamil Páral 2022-10-14 11:19:38 UTC
Brian, can you please ssh into the hanged system and collect system journal (`journalctl -b`) and possibly also gdb traceback from the running gnome-shell process (`gdb -p PID` and then "set logging enabled on" and "t a a bt full") and attach the output? I assume yours might be a different problem, but we can't tell without any logs. Also attach `rpm -qa | sort`, thanks.

Comment 57 Brian Morrison 2022-10-14 16:17:35 UTC
Are you sure this is the correct command for gdb?

"t a a bt full"

Not an expert but it seems wrong somehow.

Will provide the info in the next day or so.

Comment 58 Brian Morrison 2022-10-14 17:04:33 UTC
Created attachment 1918085 [details]
journalctl -b output

As requested

Comment 59 Brian Morrison 2022-10-14 17:05:23 UTC
Created attachment 1918086 [details]
rpm -qa | sort output

As requested

Comment 60 Brian Morrison 2022-10-14 17:14:43 UTC
I also found and installed all the gstreamer1-*-1.20.4-1 rpms, described as a bugfix release before collecting the attachments.

Comment 61 Adam Williamson 2022-10-14 20:13:17 UTC
t a a bt full is correct, it's short for 'thread apply all bt full', which means 'get a full backtrace for all threads'.

Comment 62 Brian Morrison 2022-10-14 22:17:23 UTC
Created attachment 1918159 [details]
Hung gnome-shell gdb output

This is the t a a bt full output from gdb after gnome-shell hangs with the gstreamer1-vaapi rpms installed (both i686 and x86_64)

I have no way of knowing whether this is adequate but I did notice a whole lot of debuginfo rpms were listed as missing so if necessary I can install some to get more symbol output.

Comment 63 Adam Williamson 2022-10-15 06:44:50 UTC
yes, that would likely help. Did gdb offer to turn on 'debuginfod'? If so, do that, it will automatically download the required symbols (note the downloads may be quite large, don't do this on a metered connection).

Comment 64 Ray Strode [halfline] 2022-10-27 14:20:37 UTC
So just to follow up, in comment 49 I advocated against a gnome-shell fix, but one got filed upstream independently anyway:

https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/5710

Comment 65 Ray Strode [halfline] 2022-10-27 14:43:11 UTC
Brian does installing the libva-intel-driver package from rpmfusion fix things for you?

Comment 66 Brian Morrison 2022-10-27 16:36:56 UTC
Installed libva-intel-driver-2.4.1-9.fc37.x86_64 from rpmfusion on one of my machines, then installed the gstreamer1-vaapi for both x86_64 and i686.

After a reboot gdm never gets to the point of showing a login prompt or user box to click on.

Installed libva-intel-driver-2.4.1-9.fc37.i686 as well, reinstalled gstreamer1-vaapi for both x86_64 and i686.

After another reboot gdm never gets to the point of showing a login prompt or user box to click on.

Uninstall gstreamer1-vaapi for both x86_64 and i686 leaving libva-intel-drivers installed.

Reboot, gdm back to normal with user selection shown.

Comment 67 Brian Morrison 2022-10-27 16:39:01 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2123998#c65

Do I need to try uninstalling the mesa-va-drivers packages?

Comment 68 Brian Morrison 2022-11-19 14:31:57 UTC
After installing more updates as they went into updates-testing or pending->testing I can now install the gstreamer1-vaapi packages without seeing hangs during gdm startup.

It's fixed, but I don't know exactly how.


Note You need to log in before you can comment on or make changes to this bug.