Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 2164594 - Systems that update from systemd-252.4-4.fc38 to systemd-253~rc1-1.fc38 fail to boot
Summary: Systems that update from systemd-252.4-4.fc38 to systemd-253~rc1-1.fc38 fail ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks: F38BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2023-01-25 18:53 UTC by Adam Williamson
Modified: 2023-01-26 11:02 UTC (History)
11 users (show)

Fixed In Version: systemd-253~rc1-3.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-26 10:51:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
debug boot log (deleted)
2023-01-25 19:21 UTC, Adam Williamson
no flags Details
unexpected message from udev-builtin-keyboard (deleted)
2023-01-25 22:47 UTC, Zbigniew Jędrzejewski-Szmek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github systemd systemd issues 26216 0 None open PID1 enters infinite loop when trying to stop socket with incoming traffic 2023-01-26 11:02:13 UTC

Description Adam Williamson 2023-01-25 18:53:04 UTC
In openQA testing, and also in manual testing on a local VM, systems installed with systemd-252.4-4.fc38 (e.g. from https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20230123.n.0 ) and then updated to systemd-253~rc1-1.fc38 fail to boot. Systems installed directly with systemd-253~rc1-1.fc38 seem to boot OK (in openQA, at least, haven't verified that locally yet).

With default boot options, the boot just gets stuck at the bootsplash. Hitting Esc shows it stuck at "Stopped initrd-switch-root.service - Switch Root.". With systemd.log_level=debug systemd.log_target=console , I see an eternal loop of messages from systemd-journald-audit.socket - "Incoming traffic" and then "Suppressing connection request since unit stop is scheduled." Not sure if there was any more useful message before that.

This was easy to reproduce for me: run a minimal netinst using https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20230123.n.0/compose/Everything/x86_64/os/ as the install source , then update only systemd to systemd-253~rc1-1.fc38 and reboot. I expect this bug will hit anyone updating an existing Rawhide install to the new systemd.

Proposing as an F38 Beta blocker as it probably affects upgrades from F37 too (will try and confirm that in a bit).

Comment 1 Adam Williamson 2023-01-25 19:16:12 UTC
With logs redirected somewhere I can see 'em but not at debug level, I see 'systemd-journald.service: Scheduled restart job, restart counter is at 1.', then repeating 'Looping too fast. Throttling execution a little.' messages.

Comment 2 darrell pfeifer 2023-01-25 19:21:07 UTC
In rescue mode it gets stuck on fsck

Adding no fsck parameter gets to

Mounting sysroot mount - /sysroot ...

Then hangs

Comment 3 Adam Williamson 2023-01-25 19:21:40 UTC
Created attachment 1940497 [details]
debug boot log

OK, here's a full debug-level boot log up to the point where the 'systemd-journald-audit.socket' messages start looping.

Comment 4 David Tardon 2023-01-25 20:20:28 UTC
I think initrd needs to be regenerated, otherwise the socket is still started from there.

Comment 5 David Tardon 2023-01-25 20:28:27 UTC
(In reply to David Tardon from comment #4)
> I think initrd needs to be regenerated, otherwise the socket is still
> started from there.

No, scratch that. That should workaround the problem, but it's not a fix. The socket should be enabled via presets after the update, but it looks like it isn't?

Comment 6 Zbigniew Jędrzejewski-Szmek 2023-01-25 22:46:15 UTC
Hmm, so I'm testing with a VM here, and it fails reliably with kernel-6.2.0-0.rc4.20230120gitd368967cb103.35.fc38.x86_64. But with kernel-6.1.0-0.rc2.21.fc38.x86_64, things work fine.
With the 6.2 kernel, I'm getting an OOPS with a warning, triggered by udevd, about some mapping being done wrong (I'll try to capture it properly later). And very strange messages from udev, that stink of memory corruption.
/dev/vda is not detected at all by the kernel.

I'll try to figure out what is going on tomorrow. It's close to midnight and I need catch a nap.

Comment 7 Zbigniew Jędrzejewski-Szmek 2023-01-25 22:47:37 UTC
Created attachment 1940523 [details]
unexpected message from udev-builtin-keyboard

Comment 8 darrell pfeifer 2023-01-26 06:40:44 UTC
Booting a 6 1 kernel didn't work for me. I tried downgrading

1) boot a live usb
2) download previous systemd
3) mount the old root via gnome disks and chroot to it
4) verify dnf said it saw the new version
5) rpm -Uvh --oldpackage the older systemd version

When I reboot the bad RC version is still there. I'm sure I've done this in the past. What step did I miss?

Comment 9 Adam Williamson 2023-01-26 07:14:12 UTC
as long as you mounted the right partition and chroot'ed properly, that should be right...maybe you missed a step, just try again? I tend to use dnf downgrade rather than rpm, but it shouldn't matter. oh, and you'll want to do all subpackages of systemd and downgrade them all together; I usually use `koji download-build --arch=x86_64 --arch=noarch systemd-252.4-4.fc38` (or whatever package and arch), then `dnf downgrade *.rpm`.

Comment 10 Fedora Update System 2023-01-26 10:47:41 UTC
FEDORA-2023-326cfb9cf8 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-326cfb9cf8

Comment 11 Fedora Update System 2023-01-26 10:51:09 UTC
FEDORA-2023-326cfb9cf8 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 Zbigniew Jędrzejewski-Szmek 2023-01-26 11:02:13 UTC
https://github.com/systemd/systemd/issues/26216 is an upstream bug about PID1 handling this badly.


Note You need to log in before you can comment on or make changes to this bug.