Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1817368 - Random hangs with sof-audio-pci
Summary: Random hangs with sof-audio-pci
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: alsa-sof-firmware
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-26 08:47 UTC by Pierre Ossman
Modified: 2020-06-14 17:10 UTC (History)
20 users (show)

Fixed In Version: alsa-sof-firmware-1.5-1.fc31
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-14 17:10:24 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg (89.44 KB, text/plain)
2020-03-26 08:47 UTC, Pierre Ossman
no flags Details
dmesg-2.txt (74.14 KB, text/plain)
2020-04-07 09:05 UTC, Samuel
no flags Details

Description Pierre Ossman 2020-03-26 08:47:59 UTC
Created attachment 1673722 [details]
dmesg

1. Please describe the problem:

The machine will randomly lock up at times.

2. What is the Version-Release number of the kernel:

kernel-5.5.10-200.fc31.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes, in Fedora 30. Unknown which version at the time.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

No. Not sure what triggers it. Happens several times a week though. Sometimes several times a day.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Not tested.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Attached output of "sudo journalctl -b --no-hostname -k > tmp/dmesg.txt"

Comment 1 Pierre Ossman 2020-03-26 09:01:34 UTC
The reason we suspect sof-audio-pci is because:

1. These are the only suspicious lines we see at the end of the log before each crash:

> Mar 26 09:56:01 aleta kernel: sof-audio-pci 0000:00:1f.3: error: no reply expected, received 0x0
> Mar 26 09:56:01 aleta kernel: sof-audio-pci 0000:00:1f.3: firmware boot complete

Note that they also occur at other times when the system doesn't lock up.

2. This started happening after we installed alsa-firmware. Audio was non-functional before that.

Comment 2 Pierre Ossman 2020-03-26 09:02:04 UTC
Some googling found this:

https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1847250

Are those patches and firmware updates in Fedora 31?

Comment 3 Steve 2020-03-27 22:58:24 UTC
Bug 1787387, Comment 16, has a suggested test. I think he means to run those commands in separate terminal windows.

Comment 4 Steve 2020-03-27 23:42:06 UTC
(In reply to Pierre Ossman from comment #1)
...
> 2. This started happening after we installed alsa-firmware. Audio was non-functional before that.

I suggest changing the component to alsa-firmware.
> The reason we suspect sof-audio-pci is because:
> 
> 1. These are the only suspicious lines we see at the end of the log before
> each crash:
> 
> > Mar 26 09:56:01 aleta kernel: sof-audio-pci 0000:00:1f.3: error: no reply expected, received 0x0
> > Mar 26 09:56:01 aleta kernel: sof-audio-pci 0000:00:1f.3: firmware boot complete
> 
> Note that they also occur at other times when the system doesn't lock up.
>

Comment 5 Steve 2020-03-27 23:50:32 UTC
And please post the output from:

$ rpm -qa alsa\* | sort

Comment 6 Pierre Ossman 2020-03-30 07:52:09 UTC
We've gotten a bit of corona here this week, but we'll get your questions sorted out as soon as we can.

Comment 7 Pierre Ossman 2020-04-01 08:01:43 UTC
(In reply to Steve from comment #3)
> Bug 1787387, Comment 16, has a suggested test. I think he means to run those
> commands in separate terminal windows.

Unfortunately we couldn't provoke a hang using those commands.

What we can see though is that logging in and out somewhat reliable hangs the system at some point (usually right after logout).

(In reply to Steve from comment #5)
> And please post the output from:
> 
> $ rpm -qa alsa\* | sort

alsa-firmware-1.2.1-4.fc31.noarch
alsa-lib-1.2.2-1.fc31.i686
alsa-lib-1.2.2-1.fc31.x86_64
alsa-plugins-pulseaudio-1.2.2-1.fc31.x86_64
alsa-tools-firmware-1.2.2-1.fc31.x86_64
alsa-ucm-1.2.2-1.fc31.noarch
alsa-utils-1.2.2-1.fc31.x86_64

Comment 8 Steve 2020-04-03 17:13:26 UTC
(In reply to Pierre Ossman from comment #7)
> (In reply to Steve from comment #3)
> > Bug 1787387, Comment 16, has a suggested test. I think he means to run those
> > commands in separate terminal windows.
> 
> Unfortunately we couldn't provoke a hang using those commands.
> 
> What we can see though is that logging in and out somewhat reliable hangs the system at some point (usually right after logout).

Thanks for reporting that. That could be a clue. Your system is booting with:

Mar 26 10:15:19 kernel: Kernel is locked down from EFI Secure Boot mode; see man kernel_lockdown.7

And the last line in the log is:

Mar 26 09:40:40 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7

> (In reply to Steve from comment #5)
> > And please post the output from:
> > 
> > $ rpm -qa alsa\* | sort
> 
> alsa-firmware-1.2.1-4.fc31.noarch
> alsa-lib-1.2.2-1.fc31.i686
> alsa-lib-1.2.2-1.fc31.x86_64
> alsa-plugins-pulseaudio-1.2.2-1.fc31.x86_64
> alsa-tools-firmware-1.2.2-1.fc31.x86_64
> alsa-ucm-1.2.2-1.fc31.noarch
> alsa-utils-1.2.2-1.fc31.x86_64

Thanks. Do you know whether this i686 package is needed:

alsa-lib-1.2.2-1.fc31.i686

Also, there is a newer version of alsa-firmware available:

$ dnf -q repoquery alsa-firmware --repo=updates-testing
alsa-firmware-0:1.2.1-5.fc31.noarch

Update with:

# dnf update alsa-firmware --enablerepo=updates-testing

Comment 9 Steve 2020-04-03 17:37:03 UTC
(In reply to Steve from comment #8)
...
> Update with:
> 
> # dnf update alsa-firmware --enablerepo=updates-testing

Let's get a kernel update too:

# dnf list alsa-firmware kernel --enablerepo=updates-testing --refresh

NB: There is another kernel in the pipeline:

kernel-5.5.15-200.fc31, kernel-headers-5.5.15-200.fc31, & 1 more
https://bodhi.fedoraproject.org/updates/FEDORA-2020-666f3b1ac3

Comment 10 Steve 2020-04-03 17:37:52 UTC
# dnf update alsa-firmware kernel --enablerepo=updates-testing --refresh
      ^^^^^^

Comment 11 Pierre Ossman 2020-04-06 09:05:07 UTC
We got a second machine experiencing this now after a kernel upgrade (same alsa-firmware though). It gets the same "no reply expected" errors, but it also got this:

> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel:  HDA Analog: ASoC: hw_params BE failed -110
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel:  Analog Playback and Capture: ASoC: hw_params BE failed -110
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: ASoC: can't set Analog CPU DAI hw params: -110
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: error: failed to set dai config for Analog Playback and Capture
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: error: host status 0x00000000 dsp status 0x00000000 mask 0x00000003
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: error: dsp irq ppsts 0x00000000 adspis 0x00000000
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: error: hda irq intsts 0x00000000 intlctl 0xc0000180 rirb 00
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: error: unexpected fault 0x00000000 trace 0x00000005
> Apr 06 08:35:32 samuel.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: error: ipc timed out for 0x80010000 size 120

This machine also tends to lock up directly on boot, not just on logout.

We tested upgrading to that kernel and alsa-firmware from updates-testing, but unfortunately the problem remains.

Also note that reverting to the old driver makes the problem go away, so that's another thing pointing at the sof driver being the culprit.

Comment 12 Steve 2020-04-06 16:26:57 UTC
Thanks for testing with the newer kernel and firmware.

Could you please attach a log for the latest kernel and firmware. If you have to reboot, please attach the log from the previous boot:

$ journalctl -b -1 -k --no-hostname > dmesg-2.txt
                ^^-- selects the log for the previous boot

Also, I believe this bug would be better placed under one of the "alsa" components. This looks like a good choice: "alsa-sof-firmware".

BTW, this looks similar:

Bug 1812415 - sof-audio-pci panic for intel ice lake, probably at resume

Comment 13 Samuel 2020-04-07 09:05:05 UTC
Created attachment 1676864 [details]
dmesg-2.txt

Added an attachment with the logs from the previous boot with the 5.5.15 kernel (which locked up when booting).

Comment 14 Steve 2020-04-07 16:55:48 UTC
(In reply to Samuel from comment #13)
> Created attachment 1676864 [details]
> dmesg-2.txt
> 
> Added an attachment with the logs from the previous boot with the 5.5.15 kernel (which locked up when booting).

Thanks. The log is truncated, so we will have to try getting something from the display instead.

Try removing "rhgb quiet" from the kernel command-line in grub2. If it hangs with any messages on the display, please take a photo and attach it.

If you don't see the grub2 menu while booting, try repeatedly tapping the "Esc" key while booting.

Comment 15 Samuel 2020-04-08 20:43:45 UTC
Unfortunately it hangs with a blank screen, I can't get it do display the text it ends with.

I made a video: https://www.youtube.com/watch?v=0ezKDxfUq7Q&feature=youtu.be

Comment 16 Steve 2020-04-09 00:25:01 UTC
(In reply to Samuel from comment #15)
> Unfortunately it hangs with a blank screen, I can't get it do display the text it ends with.
> 
> I made a video: https://www.youtube.com/watch?v=0ezKDxfUq7Q&feature=youtu.be

Thanks. I played the video at 0.25 speed and in full-screen mode. The compression artifacts make the text nearly unreadable, but I believe that it can be interpreted as follows:

The lines with "Starting" and "Started" are from systemd. Near the end, I believe I see:

"Started Command Scheduler." (I copied that text from the "journalctl -b" output in my F31 VM.)

Then there are five lines, with the first line containing the text "available by default".

A search of your first log finds what I believe is that line. I believe that the last line visible in the video is the one identified below.

$ fgrep -A26 -n 'no longer available by default.' dmesg-1.txt 
1033:Mar 26 09:15:29 kernel: bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
1034-Mar 26 09:15:29 kernel: tun: Universal TUN/TAP device driver, 1.6
1035-Mar 26 09:15:29 kernel: virbr0: port 1(virbr0-nic) entered blocking state
1036-Mar 26 09:15:29 kernel: virbr0: port 1(virbr0-nic) entered disabled state
1037-Mar 26 09:15:29 kernel: device virbr0-nic entered promiscuous mode          <<<<< Last line displayed in video before display goes blank.
1038-Mar 26 09:15:29 kernel: virbr0: port 1(virbr0-nic) entered blocking state
1039-Mar 26 09:15:29 kernel: virbr0: port 1(virbr0-nic) entered listening state
1040-Mar 26 09:15:29 kernel: virbr0: port 1(virbr0-nic) entered disabled state
1041-Mar 26 09:15:29 kernel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
1042-Mar 26 09:15:29 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
1043-Mar 26 09:15:29 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
1044-Mar 26 09:15:29 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
1045-Mar 26 09:15:31 kernel: sof-audio-pci 0000:00:1f.3: error: no reply expected, received 0x0
1046-Mar 26 09:15:31 kernel: sof-audio-pci 0000:00:1f.3: firmware boot complete
1047-Mar 26 09:15:32 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
1048-Mar 26 09:15:32 kernel: rfkill: input handler disabled
1049-Mar 26 09:15:33 kernel: FS-Cache: Loaded
1050-Mar 26 09:15:33 kernel: FS-Cache: Netfs 'nfs' registered for caching
1051-Mar 26 09:15:47 kernel: rfkill: input handler enabled
1052-Mar 26 09:15:47 kernel: sof-audio-pci 0000:00:1f.3: error: no reply expected, received 0x0
1053-Mar 26 09:15:47 kernel: sof-audio-pci 0000:00:1f.3: firmware boot complete
1054-Mar 26 09:15:48 kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
1055-Mar 26 09:15:48 kernel: rfkill: input handler disabled
1056-Mar 26 09:19:54 kernel: sof-audio-pci 0000:00:1f.3: error: no reply expected, received 0x0
1057-Mar 26 09:19:54 kernel: sof-audio-pci 0000:00:1f.3: firmware boot complete
1058-Mar 26 09:22:15 kernel: sof-audio-pci 0000:00:1f.3: error: no reply expected, received 0x0
1059-Mar 26 09:22:15 kernel: sof-audio-pci 0000:00:1f.3: firmware boot complete

Comment 17 Steve 2020-04-09 01:08:16 UTC
(In reply to Samuel from comment #15)
...
> I made a video: https://www.youtube.com/watch?v=0ezKDxfUq7Q&feature=youtu.be

For the record, YouTube videos can be stepped frame-by-frame with the "." (forward) and "," (backward) keys. Pause the video near where you want to view each frame before stepping.

Thanks to this web page:

Frame by frame control of a YouTube video?
https://webapps.stackexchange.com/questions/6507/frame-by-frame-control-of-a-youtube-video

Comment 18 Steve 2020-04-09 02:36:04 UTC
Could you please post the output from:

$ lsmod

That will give us a list of kernel modules that we can try blacklisting. Unfortunately, sof-audio-pci is not a kernel module:

$ modinfo sof-audio-pci
modinfo: ERROR: Module sof-audio-pci not found.

If we can find a "sof" kernel module name, it would be blacklisted on the kernel command-line by adding something like this:

module_blacklist=xxx-yyy
                 ^^^^^^^ The name of a kernel module would go there.

Documentation:

The kernel’s command-line parameters
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

For reference, here is a list of installed "sof" kernel modules from my F31 VM:

$ find /lib/modules/5.5.15-200.fc31.x86_64 -wholename \*/sof/\*
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/xtensa
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/xtensa/snd-sof-xtensa-dsp.ko.xz
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/intel
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/intel/snd-sof-intel-byt.ko.xz
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/intel/snd-sof-intel-hda-common.ko.xz
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/intel/snd-sof-intel-hda.ko.xz
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/intel/snd-sof-intel-ipc.ko.xz
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/snd-sof-acpi.ko.xz
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/snd-sof.ko.xz
/lib/modules/5.5.15-200.fc31.x86_64/kernel/sound/soc/sof/snd-sof-pci.ko.xz

Comment 19 Steve 2020-04-09 03:04:03 UTC
(In reply to Pierre Ossman from comment #11)
...
> Also note that reverting to the old driver makes the problem go away, ...

Could you clarify what you mean by "reverting to the old driver"? Are you referring to the firmware? If so, please post the version info for the firmware packages that work for you:

$ rpm -q alsa-firmware linux-firmware

Comment 20 Pierre Ossman 2020-04-09 06:33:55 UTC
(In reply to Steve from comment #19)
> 
> Could you clarify what you mean by "reverting to the old driver"? Are you
> referring to the firmware? If so, please post the version info for the
> firmware packages that work for you:
> 

No. We're reverting to snd_hda_intel by specifying:

> options snd-intel-dspcfg dsp_driver=1

Comment 21 Pierre Ossman 2020-04-09 06:35:08 UTC
(In reply to Steve from comment #18)
> Could you please post the output from:
> 
> $ lsmod
> 

Sure:

$ lsmod
Module                  Size  Used by
rfcomm                 90112  4
nfsv3                  53248  1
nfs_acl                16384  1 nfsv3
nfs                   364544  2 nfsv3
lockd                 122880  2 nfsv3,nfs
grace                  16384  1 lockd
fscache               397312  1 nfs
uinput                 20480  0
xt_CHECKSUM            16384  1
xt_MASQUERADE          20480  3
nf_nat_tftp            16384  0
nf_conntrack_tftp      20480  3 nf_nat_tftp
tun                    57344  1
bridge                208896  0
stp                    16384  1 bridge
llc                    16384  2 bridge,stp
nf_conntrack_netbios_ns    16384  1
nf_conntrack_broadcast    16384  1 nf_conntrack_netbios_ns
xt_CT                  16384  3
ip6t_REJECT            16384  12
nf_reject_ipv6         20480  1 ip6t_REJECT
ip6t_rpfilter          16384  1
ipt_REJECT             16384  5
nf_reject_ipv4         16384  1 ipt_REJECT
xt_conntrack           16384  44
ebtable_nat            16384  1
ebtable_broute         16384  1
ip6table_nat           16384  1
ip6table_mangle        16384  1
ip6table_raw           16384  1
ip6table_security      16384  1
iptable_nat            16384  1
nf_nat                 53248  4 ip6table_nat,nf_nat_tftp,iptable_nat,xt_MASQUERADE
iptable_mangle         16384  1
iptable_raw            16384  1
iptable_security       16384  1
nf_conntrack          163840  8 xt_conntrack,nf_nat,nf_conntrack_tftp,nf_conntrack_netbios_ns,nf_nat_tftp,nf_conntrack_broadcast,xt_CT,xt_MASQUERADE
nf_defrag_ipv6         24576  1 nf_conntrack
nf_defrag_ipv4         16384  1 nf_conntrack
libcrc32c              16384  2 nf_conntrack,nf_nat
ip_set                 57344  0
nfnetlink              16384  1 ip_set
ebtable_filter         16384  1
ebtables               40960  3 ebtable_nat,ebtable_filter,ebtable_broute
ip6table_filter        16384  1
ip6_tables             36864  7 ip6table_filter,ip6table_raw,ip6table_nat,ip6table_mangle,ip6table_security
iptable_filter         16384  1
cmac                   16384  9
bnep                   28672  2
snd_sof_pci            20480  0
snd_sof_intel_byt      20480  1 snd_sof_pci
snd_sof_intel_ipc      20480  1 snd_sof_intel_byt
snd_sof_xtensa_dsp     16384  1 snd_sof_pci
snd_sof_intel_hda_common    81920  1 snd_sof_pci
snd_hda_codec_hdmi     73728  1
snd_soc_hdac_hda       24576  1 snd_sof_intel_hda_common
snd_sof_intel_hda      20480  1 snd_sof_intel_hda_common
snd_sof               126976  4 snd_sof_pci,snd_sof_intel_hda_common,snd_sof_intel_byt,snd_sof_intel_ipc
intel_rapl_msr         20480  0
sunrpc                491520  18 lockd,nfsv3,nfs_acl,nfs
intel_rapl_common      32768  1 intel_rapl_msr
snd_soc_skl           180224  0
snd_soc_sst_ipc        20480  1 snd_soc_skl
snd_hda_codec_realtek   126976  1
snd_soc_sst_dsp        40960  1 snd_soc_skl
snd_hda_ext_core       36864  4 snd_sof_intel_hda_common,snd_soc_hdac_hda,snd_soc_skl,snd_sof_intel_hda
x86_pkg_temp_thermal    20480  0
snd_soc_acpi_intel_match    32768  3 snd_sof_pci,snd_sof_intel_hda_common,snd_soc_skl
intel_powerclamp       20480  0
snd_soc_acpi           16384  3 snd_sof_pci,snd_soc_acpi_intel_match,snd_soc_skl
snd_hda_codec_generic    94208  1 snd_hda_codec_realtek
coretemp               20480  0
vfat                   20480  1
ledtrig_audio          16384  3 snd_hda_codec_generic,snd_hda_codec_realtek,snd_sof
snd_soc_core          294912  4 snd_sof,snd_sof_intel_hda_common,snd_soc_hdac_hda,snd_soc_skl
fat                    86016  1 vfat
kvm_intel             311296  0
snd_compress           28672  1 snd_soc_core
ac97_bus               16384  1 snd_soc_core
snd_pcm_dmaengine      16384  1 snd_soc_core
snd_hda_intel          53248  3
snd_intel_dspcfg       28672  4 snd_hda_intel,snd_sof_pci,snd_sof_intel_hda_common,snd_soc_skl
btusb                  57344  0
snd_hda_codec         163840  5 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek,snd_soc_hdac_hda
btrtl                  24576  1 btusb
kvm                   802816  1 kvm_intel
mei_hdcp               24576  0
mei_wdt                16384  0
btbcm                  16384  1 btusb
iTCO_wdt               16384  0
btintel                32768  1 btusb
iTCO_vendor_support    16384  1 iTCO_wdt
irqbypass              16384  1 kvm
bluetooth             634880  33 btrtl,btintel,btbcm,bnep,btusb,rfcomm
snd_hda_core          102400  10 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_ext_core,snd_hda_codec,snd_hda_codec_realtek,snd_sof_intel_hda_common,snd_soc_hdac_hda,snd_soc_skl,snd_sof_intel_hda
crct10dif_pclmul       16384  1
snd_hwdep              16384  1 snd_hda_codec
crc32_pclmul           16384  0
snd_seq                86016  0
snd_seq_device         16384  1 snd_seq
ghash_clmulni_intel    16384  0
snd_pcm               126976  9 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_sof,snd_sof_intel_hda_common,snd_soc_core,snd_soc_skl,snd_hda_core,snd_pcm_dmaengine
intel_cstate           20480  0
intel_uncore          147456  0
ecdh_generic           16384  2 bluetooth
rtsx_usb_ms            24576  0
rfkill                 28672  4 bluetooth
intel_rapl_perf        20480  0
snd_timer              40960  2 snd_seq,snd_pcm
wmi_bmof               16384  0
intel_wmi_thunderbolt    20480  0
i2c_i801               32768  0
memstick               24576  1 rtsx_usb_ms
ecc                    36864  1 ecdh_generic
snd                   102400  18 snd_hda_codec_generic,snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_compress,snd_soc_core,snd_pcm
joydev                 28672  0
soundcore              16384  1 snd
mei_me                 45056  2
mei                   122880  5 mei_wdt,mei_hdcp,mei_me
ie31200_edac           16384  0
acpi_pad              184320  0
ip_tables              32768  5 iptable_filter,iptable_security,iptable_raw,iptable_nat,iptable_mangle
rtsx_usb_sdmmc         32768  0
mmc_core              184320  1 rtsx_usb_sdmmc
hid_logitech_hidpp     45056  0
hid_microsoft          16384  0
rtsx_usb               28672  2 rtsx_usb_sdmmc,rtsx_usb_ms
ff_memless             20480  1 hid_microsoft
i915                 2453504  5
i2c_algo_bit           16384  1 i915
drm_kms_helper        233472  1 i915
nvme                   49152  3
nvme_core             110592  5 nvme
drm                   585728  6 drm_kms_helper,i915
e1000e                290816  0
crc32c_intel           24576  5
wmi                    36864  2 intel_wmi_thunderbolt,wmi_bmof
video                  53248  1 i915
hid_logitech_dj        28672  0
fuse                  139264  1

Comment 22 Steve 2020-04-09 10:46:39 UTC
(In reply to Pierre Ossman from comment #20)
...
> No. We're reverting to snd_hda_intel by specifying:
> 
> > options snd-intel-dspcfg dsp_driver=1

Thanks. I had to look that up. Where are you specifying that? And where did you get that option?

For the record:

$ modinfo snd_intel_dspcfg | egrep 'description|parm'
description:    Intel DSP config driver
parm:           dsp_driver:Force the DSP driver for Intel DSP (0=auto, 1=legacy, 2=SST, 3=SOF) (int)

Comment 23 Pierre Ossman 2020-04-09 11:06:49 UTC
(In reply to Steve from comment #22)
> (In reply to Pierre Ossman from comment #20)
> ...
> > No. We're reverting to snd_hda_intel by specifying:
> > 
> > > options snd-intel-dspcfg dsp_driver=1
> 
> Thanks. I had to look that up. Where are you specifying that? And where did
> you get that option?
> 

We put it in /etc/modprobe.d/snd_intel.conf. Found it by examining snd_hda_intel and figuring out why it wasn't loading. Its modinfo output refers to that new setting:

> parm:           dmic_detect:Allow DSP driver selection (bypass this driver) (0=off, 1=on) (default=1); deprecated, use snd-intel-dspcfg.dsp_driver option instead (bool)

Comment 24 Steve 2020-04-09 11:46:59 UTC
(In reply to Pierre Ossman from comment #23)
> (In reply to Steve from comment #22)
> > (In reply to Pierre Ossman from comment #20)
> > ...
> > > No. We're reverting to snd_hda_intel by specifying:
> > > 
> > > > options snd-intel-dspcfg dsp_driver=1
> > 
> > Thanks. I had to look that up. Where are you specifying that? And where did you get that option?
> > 
> 
> We put it in /etc/modprobe.d/snd_intel.conf. Found it by examining snd_hda_intel and 
> figuring out why it wasn't loading. Its modinfo output refers to that new setting:
> 
> > parm:           dmic_detect:Allow DSP driver selection (bypass this driver) (0=off, 1=on) (default=1); deprecated, use snd-intel-dspcfg.dsp_driver option instead (bool)

Thanks. Could you post:

$ cat /etc/modprobe.d/snd_intel.conf

Have you added any other custom files to /etc/modprobe.d/ ?

Comment 25 Steve 2020-04-09 11:52:35 UTC
(In reply to Pierre Ossman from comment #21)
...
> Sure:
> 
> $ lsmod
> Module                  Size  Used by
...

Thanks. That gives us a lot of choices for blacklisting. :-)

For blacklist testing, please disable the option in /etc/modprobe.d/snd_intel.conf:

# options snd-intel-dspcfg dsp_driver=1

Please try adding this to the kernel command-line from grub2 and seeing if the hang occurs:

module_blacklist=snd_sof_pci

For the record:

$ grep '^snd' lsmod-1.txt | wc -l
30

$ grep '^snd.*0$' lsmod-1.txt
snd_sof_pci            20480  0
snd_soc_skl           180224  0
snd_seq                86016  0

$ sort lsmod-1.txt | grep pci
snd_intel_dspcfg       28672  4 snd_hda_intel,snd_sof_pci,snd_sof_intel_hda_common,snd_soc_skl
snd_soc_acpi           16384  3 snd_sof_pci,snd_soc_acpi_intel_match,snd_soc_skl
snd_soc_acpi_intel_match    32768  3 snd_sof_pci,snd_sof_intel_hda_common,snd_soc_skl
snd_sof               126976  4 snd_sof_pci,snd_sof_intel_hda_common,snd_sof_intel_byt,snd_sof_intel_ipc
snd_sof_intel_byt      20480  1 snd_sof_pci
snd_sof_intel_hda_common    81920  1 snd_sof_pci
snd_sof_pci            20480  0
snd_sof_xtensa_dsp     16384  1 snd_sof_pci

Comment 26 Steve 2020-04-09 14:48:16 UTC
This is getting out of my lane, but it might help debug the problem. Could you post the output from:

$ lspci -nn # That will show "Show PCI vendor and device codes as both numbers and names." 

I believe the relevant kernel source file is:

sound/soc/sof/sof-pci-dev.c

This is a commit that changed the names of certain firmware files in that file:

ASoC: SOF: Intel: Fix CFL and CML FW nocodec binary names.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/sound/soc/sof/sof-pci-dev.c?h=linux-5.5.y&id=653c28afa76b45c570370c3c3a89975c68c5fc8e

But when I look at what alsa-firmware has in /lib/firmware/intel/sof, there is this pair of links:

sof-cfl.ri -> sof-cnl.ri
sof-cnl.ri -> signed/sof-cnl-signed-intel.ri

If I am interpreting the patch correctly, it has no effect whatsoever, because sof-cfl.ri and sof-cnl.ri both go to the same file.

$ rpm -q alsa-firmware
alsa-firmware-1.2.1-5.fc31.noarch

Here is a bigger view of the patch. Specifically, look at the block under CONFIG_SND_SOC_SOF_COFFEELAKE:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/sound/soc/sof/sof-pci-dev.c?h=linux-5.5.y&id=653c28afa76b45c570370c3c3a89975c68c5fc8e&context=8&ignorews=0&dt=1

Comment 27 Steve 2020-04-09 15:04:31 UTC
To add to the confusion, the log doesn't show the actual name of the firmware file that the code is loading, unless something goes wrong.

From the attached dmesg.txt:

Mar 26 09:15:22 kernel: sof-audio-pci 0000:00:1f.3: Firmware info: version 1:1:0-fcf6c
Mar 26 09:15:22 kernel: sof-audio-pci 0000:00:1f.3: Firmware: ABI 3:11:0 Kernel ABI 3:11:0

There is no firmware file name in that!

However, an earlier bug report shows the firmware file name:

Jan 15 11:16:45 ossman.lkpg.cendio.se kernel: sof-audio-pci 0000:00:1f.3: Direct firmware load for intel/sof/sof-cfl.ri failed with error -2
                                                                                                             ^^^^^^^^^^

That log line is from:

Bug 1791597 - alsa-firmware missing on upgrade

Comment 28 Samuel 2020-04-27 08:28:52 UTC
(In reply to Steve from comment #24)
> Thanks. Could you post:
> 
> $ cat /etc/modprobe.d/snd_intel.conf
> 
> Have you added any other custom files to /etc/modprobe.d/ ?

$ cat /etc/modprobe.d/snd_intel.conf 
options snd-intel-dspcfg dsp_driver=1

No other custom files in /etc/modprobe.d/:

$ ls /etc/modprobe.d/
blacklist-visor.conf    kvm.conf    mlx4.conf       truescale.conf
firewalld-sysctls.conf  lockd.conf  snd_intel.conf

Comment 29 Samuel 2020-04-27 08:54:16 UTC
(In reply to Steve from comment #25)
> For blacklist testing, please disable the option in
> /etc/modprobe.d/snd_intel.conf:
> 
> # options snd-intel-dspcfg dsp_driver=1
> 
> Please try adding this to the kernel command-line from grub2 and seeing if
> the hang occurs:
> 
> module_blacklist=snd_sof_pci

That helped.

First I disabled the the workaround in /etc/modprobe.d/snd_intel.conf:

$ cat /etc/modprobe.d/snd_intel.conf 
#options snd-intel-dspcfg dsp_driver=1

And I could verify that the hang still happens with the updated kernel:

$ uname -r
5.6.6-200.fc31.x86_64

I killed the power and booted up again. While booting the second time, I edited the kernel command-line to add "module_blacklist=snd_sof_pci" and then the machine booted without problems, no hang.

Comment 30 Samuel 2020-04-27 08:55:35 UTC
(In reply to Steve from comment #26)
> This is getting out of my lane, but it might help debug the problem. Could
> you post the output from:
> 
> $ lspci -nn # That will show "Show PCI vendor and device codes as both
> numbers and names." 

$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec2] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 630 (Desktop) [8086:3e92]
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
00:14.2 RAM memory [0500]: Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f] (rev 10)
00:16.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH HECI Controller [8086:a360] (rev 10)
00:16.3 Serial controller [0700]: Intel Corporation Cannon Lake PCH Active Management Technology - SOL [8086:a363] (rev 10)
00:17.0 SATA controller [0106]: Intel Corporation Cannon Lake PCH SATA AHCI Controller [8086:a352] (rev 10)
00:1b.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 [8086:a32c] (rev f0)
00:1f.0 ISA bridge [0601]: Intel Corporation Q370 Chipset LPC/eSPI Controller [8086:a306] (rev 10)
00:1f.3 Audio device [0403]: Intel Corporation Cannon Lake PCH cAVS [8086:a348] (rev 10)
00:1f.4 SMBus [0c05]: Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323] (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller [8086:a324] (rev 10)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (7) I219-LM [8086:15bb] (rev 10)
01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]

Comment 31 Steve 2020-04-27 15:36:57 UTC
(In reply to Samuel from comment #29)
> ... I edited the kernel command-line to add "module_blacklist=snd_sof_pci" and then the machine booted without problems, no hang.

Thanks for your careful testing and thorough documentation. That, at least, provides another workaround.

> 00:1f.3 Audio device [0403]: Intel Corporation Cannon Lake PCH cAVS [8086:a348] (rev 10)
                                                                            ^^^^

Thanks for posting the lspci output. Despite what everything else says, "a348" is Coffee Lake, which correlates with the name of the firmware file: sof-cfl.ri. (Comment 27)

==
$ git grep -C1 'a348' sound/soc/sof/sof-pci-dev.c
sound/soc/sof/sof-pci-dev.c-#if IS_ENABLED(CONFIG_SND_SOC_SOF_COFFEELAKE)
sound/soc/sof/sof-pci-dev.c:    { PCI_DEVICE(0x8086, 0xa348),
sound/soc/sof/sof-pci-dev.c-            .driver_data = (unsigned long)&cfl_desc},

$ git log --oneline -n1 HEAD
6a8b55ed40 (HEAD -> master, tag: v5.7-rc3, origin/master, origin/HEAD) Linux 5.7-rc3

Comment 32 Steve 2020-04-27 15:49:34 UTC
This would probably get more attention under an alsa component, so I suggest:

1. Changing the bug summary to:

"snd_sof_pci: boot hang with Coffee Lake [8086:a348] and 5.6.6-200.fc31.x86_64"

2. Changing the component to: "alsa-sof-firmware".

For the record:

$ grep -B9 'sound/soc/sof' MAINTAINERS
SOUND - SOUND OPEN FIRMWARE (SOF) DRIVERS
...
L:	sound-open-firmware (moderated for non-subscribers)
S:	Supported
W:	https://github.com/thesofproject/linux/
F:	sound/soc/sof/

Comment 33 Pierre Ossman 2020-04-29 08:22:33 UTC
Reported to the sof project:

https://github.com/thesofproject/linux/issues/2066

Comment 34 Samuel 2020-04-29 08:35:26 UTC
(In reply to Samuel from comment #29)
> > ...
> >
> > Please try adding this to the kernel command-line from grub2 and seeing if
> > the hang occurs:
> > 
> > module_blacklist=snd_sof_pci
>
> ...
> 
> I killed the power and booted up again. While booting the second time, I
> edited the kernel command-line to add "module_blacklist=snd_sof_pci" and
> then the machine booted without problems, no hang.

Note that this method causes audio to stop working.

Comment 35 Steve 2020-04-29 17:16:23 UTC
I just looked again at the alsa-firmware updates and can see that the F31 firmware is *older* than the F32 firmware. Further, there is a separate alsa-sof-firmware package:

https://bodhi.fedoraproject.org/updates/?packages=alsa-firmware

Installing these might be needed:

alsa-firmware-1.2.1-7.fc32 and alsa-sof-firmware-1.4.2-4.fc32 
https://bodhi.fedoraproject.org/updates/FEDORA-2020-0a7be7437f

Jaroslav: Is there any reason why those can't be released for F31?

Comment 36 Steve 2020-04-30 12:53:56 UTC
(In reply to Samuel from comment #29)
...
> And I could verify that the hang still happens with the updated kernel:
> 
> $ uname -r
> 5.6.6-200.fc31.x86_64
...

Ranjani, upstream, cites commit 8354d9b44530b.*

However, that commit is not yet in a kernel stable release:

$ git describe 8354d9b44530b
fatal: Not a valid object name 8354d9b44530b

$ git log --oneline --no-decorate -n1
63c3d4974 Linux 5.6.8

So you would need to test with a "rawhide" kernel (currently F33) from here:

https://bodhi.fedoraproject.org/updates/?packages=kernel

See, however:

Bug 1825940 - kernel-core-5.7.0-0.rc1.20200416git9786cab67457.1 took very long time to install 

* https://github.com/thesofproject/linux/issues/2066#issuecomment-621309596

Comment 37 Steve 2020-05-04 16:48:03 UTC
It looks like Jaroslav has just completed a build of alsa-sof-firmware for F31:

alsa-sof-firmware
https://koji.fedoraproject.org/koji/packageinfo?packageID=30869

Information for build alsa-sof-firmware-1.4.2-5.fc31
https://koji.fedoraproject.org/koji/buildinfo?buildID=1502856

Thank-you, Jaroslav.

Upstream reference:

https://github.com/thesofproject/linux/issues/2066#issuecomment-623534966

Comment 38 Steve 2020-06-01 15:36:54 UTC
alsa-sof-firmware-1.5-1 packages have been built:

https://koji.fedoraproject.org/koji/packageinfo?packageID=30869

Jaroslav: Could you push the F31 package:

https://bodhi.fedoraproject.org/updates/?packages=alsa-sof-firmware

Comment 39 Fedora Update System 2020-06-01 16:04:15 UTC
FEDORA-2020-18cc548263 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-18cc548263

Comment 40 Fedora Update System 2020-06-02 04:00:26 UTC
FEDORA-2020-18cc548263 has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-18cc548263`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-18cc548263

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 41 Samuel 2020-06-03 08:56:58 UTC
I removed my workaround and upgraded to Fedora 32, but that did not solve the issue.

However, I then installed alsa-sof-firmware-1.5-1 from updates-testing, and now it works! Thanks for fixing this!

Comment 42 Steve 2020-06-04 16:17:19 UTC
(In reply to Samuel from comment #41)
> I removed my workaround and upgraded to Fedora 32, but that did not solve the issue.
> 
> However, I then installed alsa-sof-firmware-1.5-1 from updates-testing, and now it works! Thanks for fixing this!

Thanks for doing those before-and-after tests.

Comment 43 Fedora Update System 2020-06-14 17:10:24 UTC
alsa-firmware-1.2.1-6.fc31, alsa-sof-firmware-1.5-1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.