Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 2121197

Summary: Failing services in new Fedora IoT installations with RO /sysroot
Product: [Fedora] Fedora Reporter: Paul Whalen <pwhalen>
Component: IoTAssignee: Peter Robinson <pbrobinson>
Status: CLOSED RAWHIDE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: akoutsou, awilliam, bcotton, dustymabe, jmarrero, jonathan, lucab, miabbott, pbrobinson, philip.wyett, robatino, robertthomasfairley, travier, walters
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: openqa
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-29 11:01:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269538, 2009537, 2060976, 2153434    

Description Paul Whalen 2022-08-24 19:13:15 UTC
Description of problem:

Recent installations of Fedora 37 IoT have a long list of failing services:

systemctl --all --failed
  UNIT                                LOAD   ACTIVE SUB    DESCRIPTION         >
● dbus-broker.service                 loaded failed failed D-Bus System Message>
● greenboot-grub2-set-counter.service loaded failed failed Set grub2 boot count>
● greenboot-healthcheck.service       loaded failed failed greenboot Health Che>
● NetworkManager.service              loaded failed failed Network Manager
● polkit.service                      loaded failed failed Authorization Manager
● redboot-auto-reboot.service         loaded failed failed Reboot on red boot s>
● rpm-ostreed.service                 loaded failed failed rpm-ostree System Ma>
● systemd-oomd.service                loaded failed failed Userspace Out-Of-Mem>
● systemd-resolved.service            loaded failed failed Network Name Resolut>
● systemd-userdbd.service             loaded failed failed User Database Manage>
● systemd-oomd.socket                 loaded failed failed Userspace Out-Of-Mem>
● systemd-userdbd.socket              loaded failed failed User Database Manage>

mount | grep /sysroot
/dev/mapper/fedora--iot_fedora-root on /sysroot type ext4 (ro,relatime,seclabel)

Remounting /sysroot as RW I can start all services again, this also does not affect upgrades from F36 where sysroot remains RW. 

Version-Release number of selected component (if applicable):
rpm-ostree-2022.12-4.fc37.aarch64
ostree-2022.5-2.fc37.aarch64
anaconda-37.12.1-1.fc37

How reproducible:
Everytime

Comment 1 Fedora Blocker Bugs Application 2022-08-25 17:48:18 UTC
Proposed as a Blocker for 37-final by Fedora user coremodule using the blocker tracking app because:

 Proposing as an F37 blocker as it appears to violate the following criterion:

All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present.
 
https://fedoraproject.org/wiki/Fedora_37_Final_Release_Criteria#System_services

Comment 2 Adam Williamson 2022-08-25 17:53:40 UTC
yeah, openQA has been hitting this for days. I mentioned it in IRC but didn't see any followup at the time.

This prevents all the openQA tests from working, so it might be a Beta blocker, really. Let's throw it on that list for now and I'll see if it's a case of "beta functionality really doesn't work" or just "the noise throws openQA off".

Comment 3 Ben Cotton 2022-08-25 20:54:36 UTC
This sounds like it could be a side effect of https://fedoraproject.org/wiki/Changes/Silverblue_Kinoite_readonly_sysroot

Setting to block 2060976, the tracker for that Change.

Comment 4 Adam Williamson 2022-08-25 21:08:38 UTC
Well, the fact that the change affects IoT isn't a side effect, it's in the description:

"This change applies to new and existing installations of Fedora Silverblue and Kinoite and only to new installations of Fedora IoT."

But the fact that it breaks everything is a problem, yeah. :D

Comment 5 Colin Walters 2022-08-25 21:18:19 UTC
Do you have `rw` on the kernel command line?

Comment 6 Adam Williamson 2022-08-25 22:01:15 UTC
Yeah, it does have that there. I didn't put it there, though. It's like that out of the "box" (the IoT dvd-ostree install image, in my case). Just do a fresh install of https://kojipkgs.fedoraproject.org/compose/iot/Fedora-IoT-37-20220825.0/compose/IoT/x86_64/iso/Fedora-IoT-ostree-x86_64-37-20220825.0.iso without doing anything unusual, boot it, and you have 'rw' in cmdline and hit this bug.

If I take that out of the cmdline or change it to 'ro', boot loops with ostree-prepare-root.service failing.

openQA tests fail because they try to switch to a different VT, and there is no console running on any VT besides 1 (probably the services that should run one fail to start). This is a clear violation of Beta criterion "A system installed without a graphical package set must boot to a working login prompt without any unintended user intervention, and all virtual consoles intended to provide a working login prompt must do so.", so that supports the Beta blocker nomination.

Comment 7 Adam Williamson 2022-08-25 22:02:32 UTC
Yeah, if I look at logs after switching to VT2 and back to VT1, I see "Failed to start autovt: Transport endpoint is not connected".

Comment 8 Colin Walters 2022-08-26 18:08:17 UTC
Try setting `tmp-is-dir: true` in the manifest in https://pagure.io/fedora-iot/ostree/blob/main/f/fedora-iot-base.json

Comment 9 Adam Williamson 2022-08-26 18:18:34 UTC
Peter, can you try Colin's suggestion?

Comment 10 Peter Robinson 2022-08-27 08:34:04 UTC
Pushed for rawhide and kicked off a compose

Comment 11 Adam Williamson 2022-08-27 16:19:43 UTC
That looks good for Rawhide, only two failed tests now:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=38&build=Fedora-IoT-38-20220827.0&groupid=1

I'll look into those. We'll need the change for F37 too.

Comment 12 Adam Williamson 2022-08-27 17:05:05 UTC
Remaining failed tests both look to be caused by https://bugzilla.redhat.com/show_bug.cgi?id=2121944 .

Comment 13 Peter Robinson 2022-08-29 11:01:45 UTC
Applied to F-37 too, thanks Adam and Colin.

Comment 14 Adam Williamson 2022-08-29 15:26:17 UTC
confirmed, we got an F37 compose that is in the same state as Rawhide (most things work, https://bugzilla.redhat.com/show_bug.cgi?id=2121944 is an issue).