Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1817475 - Rare crash of alsactl in daemon mode related to "systemctl stop alsa-state" (or system shutdown as such)
Summary: Rare crash of alsactl in daemon mode related to "systemctl stop alsa-state" (...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: alsa-utils
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jaroslav Kysela
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-26 12:10 UTC by Jan Pokorný [poki]
Modified: 2021-06-10 01:13 UTC (History)
1 user (show)

Fixed In Version: alsa-utils-1.2.5-2.fc34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-10 01:13:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jan Pokorný [poki] 2020-03-26 12:10:28 UTC
First, this was rather difficult to investigate to the at least
partially actionable form, not to speak about about reproducibility.

The main reason for this difficulty was the fact this occurred while
the system was in a shutdown procedure, hence no coredump was preserved,
likely because of some systemd-spotted conflict ass demonstrated in
the journal at around the same time (for another PID, likely something
entirely else crashed as well and took a precedence in getting logged
about, otherwise it could be logged for this crashed alsactl daemon
in the same fashion):

> systemd[1]: Requested transaction contradicts existing jobs: Transaction for systemd-coredump/start is destructive (systemd-poweroff.service has 'start' job queued, but 'stop' is included in transact>
> systemd[1]: systemd-coredump.socket: Failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction for systemd-coredump/start is destructive (system>
> systemd[1]: systemd-coredump.socket: Failed with result 'resources'.

So, what suspicious I spotted first:

> kernel: alsactl[1037]: segfault at 28 ip 000055d3526bc55a sp 00007fffca8baf40 error 4 in alsactl[55d3526ad000+11000]
> kernel: Code: e9 2b ff ff ff 48 83 4c 14 f8 00 e9 fe fe ff ff e8 ab 25 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 41 54 55 53 48 8b 2f 48 89 fb <8b> 55 28 85 d2 7e 1e 45 31 e4 0f 1f 40 00 48 8b 45 20 4a 8b 3c e0
> audit[1037]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:alsa_t:s0 pid=1037 comm="alsactl" exe="/usr/sbin/alsactl" sig=11 res=1

So, we have a standard SIGSEGV case here, but what's the cause?

Note that the PID is for what's been running since this boot session:

> systemd[1]: Started Manage Sound Card State (restore and store).
> audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=alsa-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> systemd[1]: Reached target Sound Card.
> alsactl[1037]: alsactl 1.2.2 daemon started

Well, I tried to recover the "point of crash" from IP info, but
it appeared unreliable:

hex(0x55d3526bc55a - 0x55d3526ad000) = 0xf55a

(gdb) disassemble /r 0xf55a
Dump of assembler code for function replace_untrusted_chars:
[...]
   0x000000000000f547 <+183>:	75 89	jne    0xf4d2 <replace_untrusted_chars+66>
   0x000000000000f549 <+185>:	48 83 c4 08	add    $0x8,%rsp
   0x000000000000f54d <+189>:	44 89 e0	mov    %r12d,%eax
   0x000000000000f550 <+192>:	5b	pop    %rbx
   0x000000000000f551 <+193>:	5d	pop    %rbp
   0x000000000000f552 <+194>:	41 5c	pop    %r12
   0x000000000000f554 <+196>:	41 5d	pop    %r13
   0x000000000000f556 <+198>:	41 5e	pop    %r14
   0x000000000000f558 <+200>:	41 5f	pop    %r15
>  0x000000000000f55a <+202>:	c3	retq   

$ addr2line -e /usr/sbin/alsactl 0xf55a
> /usr/src/debug/alsa-utils-1.2.2-1.fc33.x86_64/alsactl/init_utils_string.c:183

$ grep -rnF replace_untrusted_chars /usr/src/debug/alsa-utils-1.2.2-1.fc33.x86_64
> /usr/src/debug/alsa-utils-1.2.2-1.fc33.x86_64/alsactl/init_parse.c:1171:				count = replace_untrusted_chars(temp2);
> /usr/src/debug/alsa-utils-1.2.2-1.fc33.x86_64/alsactl/init_parse.c:1399:				count = replace_untrusted_chars(result);
> /usr/src/debug/alsa-utils-1.2.2-1.fc33.x86_64/alsactl/init_utils_string.c:153:static int replace_untrusted_chars(char *str)

Well, that'd be rather strange to parse configuration when about to be
shut down, wouldn't it?

I could, however, find the op code match (from "Code:" line above)
and map it down to this offset within this exact alsactl binary
(see at the end): 0x15530

Now, follow with me -- if I understand it correctly, this refers to
the code that would get executed if the segfault didn't chime in:

(gdb) disassemble /r 0x15530
Dump of assembler code for function card_events:
[...]
   0x000000000001552b <+443>:	e8 b0 fb ff ff	callq  0x150e0 <remove_from_list>
>  0x0000000000015530 <+448>:	e9 2b ff ff ff	jmpq   0x15460 <card_events+240>
>  0x0000000000015535 <+453>:	48 83 4c 14 f8 00	orq    $0x0,-0x8(%rsp,%rdx,1)
>  0x000000000001553b <+459>:	e9 fe fe ff ff	jmpq   0x1543e <card_events+206>
>  0x0000000000015540 <+464>:	e8 ab 25 ff ff	callq  0x7af0 <__stack_chk_fail@plt>

(you can verify the opcodes with "Code:" section above)

$ addr2line -e /usr/sbin/alsactl 0x15530 
> /usr/src/debug/alsa-utils-1.2.2-1.fc33.x86_64/alsactl/daemon.c:25

$ cat -n /usr/src/debug/alsa-utils-1.2.2-1.fc33.x86_64/alsactl/daemon.c | sed -n 250,265p
   250		while (snd_ctl_read(card->handle, ev) == 1) {
   251			type = snd_ctl_event_get_type(ev);
   252			if (type != SND_CTL_EVENT_ELEM)
   253				continue;
   254			mask = snd_ctl_event_elem_get_mask(ev);
   255			snd_ctl_event_elem_get_id(ev, id);
   256			if (mask == SND_CTL_EVENT_MASK_REMOVE) {
   257				remove_from_list(&card->whitelist, id);
   258				remove_from_list(&card->blacklist, id);
>  259				continue;
   260			}
   261			if (mask & SND_CTL_EVENT_MASK_INFO) {
   262				remove_from_list(&card->whitelist, id);
   263				remove_from_list(&card->blacklist, id);
   264			}
   265			if (mask & (SND_CTL_EVENT_MASK_VALUE|

I was unable to find any link between remove_from_list function and
replace_untrusted_chars, to it seems to me as if there's something
fishy with the former way of nailing it, but this latter seems more
plausible -- we can even reconstruct the whole backtrace solely based
on logic of inevitability of caller-callee relations:

> [???]
> alsactl/daemon.c: remove_from_list()  # see above
> alsactl/daemon.c: card_events()  # ditto
> alsactl/daemon.c: state_daemon()
> alsactl/alsactl.c: main()

That's the best approximation of the problem I can provide.

At the very least, I'd suggest reviewing every and all suspect case
reported by static analyzers, perhaps the problem is already known
at this level.

$ rpm -q alsa-utils pulseaudio
> alsa-utils-1.2.2-1.fc33.x86_64
> pulseaudio-13.99.1-2.fc33.x86_64

(the former is important to align on data the offsets above
relate to, the latter may have something to do with disappearance
of sound cards)

snippet from alsa-info.sh:

> !!Soundcards recognised by ALSA
> !!-----------------------------
> 
>  0 [PCH            ]: HDA-Intel - HDA Intel PCH
>                       HDA Intel PCH at 0xf2240000 irq 134
> 
> 
> !!PCI Soundcards installed in the system
> !!--------------------------------------
> 
> 00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
>         Subsystem: Lenovo Device [17aa:5050]

Comment 1 Jan Pokorný [poki] 2020-03-26 14:09:42 UTC
> The main reason for this difficulty was the fact this occurred
> while the system was in a shutdown procedure, hence no coredump
> was preserved, likely because of some systemd-spotted conflict
> as demonstrated in the journal at around the same time

Followed-up at
https://github.com/systemd/systemd/issues/7176#issuecomment-604450737

Comment 2 Ben Cotton 2020-08-11 13:15:18 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 3 Fedora Update System 2021-06-04 07:44:08 UTC
FEDORA-2021-e53b66a8c6 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-e53b66a8c6

Comment 4 Fedora Update System 2021-06-05 01:09:30 UTC
FEDORA-2021-e53b66a8c6 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-e53b66a8c6`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-e53b66a8c6

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 5 Fedora Update System 2021-06-10 01:13:09 UTC
FEDORA-2021-e53b66a8c6 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.