Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1888436

Summary: ostree-prepare-root.service: Failed with result 'core-dump'
Product: [Fedora] Fedora Reporter: Paul Whalen <pwhalen>
Component: ostreeAssignee: Jonathan Lebon <jlebon>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: awilliam, dustymabe, jlebon, jonathan, miabbott, pbrobinson, robertthomasfairley, travier, walters
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedFreezeException
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-19 18:54:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269538, 1766778    
Attachments:
Description Flags
journalctl
none
rdsosreport.txt
none
full strace output none

Description Paul Whalen 2020-10-14 20:41:07 UTC
Description of problem:
Attempting to boot the armhfp iot disk images fails:

..
[  OK  ] Found device /dev/disk/by-…d-b143-46f2-bd43-da13120110ab.
[  OK  ] Reached target Initrd Root Device.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting File System Check…b143-46f2-bd43-da13120110ab...
[  OK  ] Finished File System Check…d-b143-46f2-bd43-da13120110ab.
         Mounting /sysroot...
[    9.051723] EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: (null)
[  OK  ] Mounted /sysroot.
         Starting OSTree Prepare OS/...
[FAILED] Failed to start OSTree Prepare OS/.
See 'systemctl status ostree-prepare-root.service' for details.


● ostree-prepare-root.service - OSTree Prepare OS/
     Loaded: loaded (/usr/lib/systemd/system/ostree-prepare-root.service; enabled; vendor preset: enabled)
     Active: failed (Result: core-dump) since Wed 2020-09-30 00:00:29 UTC; 5min ago
       Docs: man:ostree(1)
   Main PID: 768 (code=dumped, signal=SEGV)
        CPU: 10ms

Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Failed with result 'core-dump'.
Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Service will not restart (restart setting)
Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Changed start -> failed
Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Job 538 ostree-prepare-root.service/start finished, result=failed
Sep 30 00:00:29 fedora systemd[1]: Failed to start OSTree Prepare OS/.
Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Unit entered failed state.
Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Triggering OnFailure= dependencies.
Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Consumed 10ms CPU time.
Sep 30 00:00:29 fedora systemd[1]: ostree-prepare-root.service: Control group is empty.
Sep 30 00:00:30 fedora systemd[1]: ostree-prepare-root.service: Changed dead -> failed


Version-Release number of selected component (if applicable):
ostree-2020.6-5.fc33

How reproducible:
Everytime

Comment 1 Paul Whalen 2020-10-14 20:44:05 UTC
Created attachment 1721588 [details]
journalctl

Comment 2 Paul Whalen 2020-10-14 20:44:53 UTC
From the logs:

Sep 30 00:02:21 fedora systemd[2690]: ostree-prepare-root.service: Executing: /usr/lib/ostree/ostree-prepare-root /sysroot
Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Got notification message from PID 219 (FDSTORE=1)
Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Added fd 40 (n/a) to fd store.
Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Got notification message from PID 219 (FDSTORE=1)
Sep 30 00:02:21 fedora systemd[1]: systemd-journald.service: Added fd 41 (n/a) to fd store.
Sep 30 00:02:21 fedora audit[2690]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2690 comm="ostree-prepare-"
 exe="/usr/lib/ostree/ostree-prepare-root" sig=11 res=1
Sep 30 00:02:21 fedora systemd-coredump[2691]: Failed to connect to coredump service: No such file or directory
Sep 30 00:02:21 fedora systemd[1]: Received SIGCHLD from PID 2690 (ostree-prepare-).
Sep 30 00:02:21 fedora systemd[1]: Child 2690 (ostree-prepare-) died (code=dumped, status=11/SEGV)

Comment 3 Paul Whalen 2020-10-14 20:45:42 UTC
Created attachment 1721590 [details]
rdsosreport.txt

Comment 4 Jonathan Lebon 2020-10-14 21:21:15 UTC
Ahh fun! The SIGSEGV and armhfp tells me this could be arch-specific (something int-size related issue maybe?). Hmm, this is going to be tricky to debug.

Do you have access to the emergency console after? You should be able to try to run ostree-prepare-root manually yourself. You'll have to mount /sysroot first. So e.g.:

$ mount /dev/disk/by-label/root /sysroot
$ /usr/lib/ostree/ostree-prepare-root /sysroot

If you can do that and the segfault reproduces, one thing you can try is to run it with strace at least so we can tell where roughly in the code to look. You can fetch strace from the host (if it's in the IoT package set), e.g.:

$ /sysroot/ostree/deploy/*/deploy/*.0/usr/bin/strace /usr/lib/ostree/ostree-prepare-root /sysroot

If it's not in the IoT package set, you can also transfer it via e.g. an attached USB device or from the network if booted with rd.neednet=1 (or even using libguestfs to copy it into the /boot partition of the image beforehand).

Could probably take this to the next level and run `gdb.minimal --args ...` on it to get the exact spot where the segfault occurs. :)

Comment 5 Colin Walters 2020-10-14 22:13:58 UTC
> (something int-size related issue maybe?)

Right, may be reproducible with a 32 bit x86 userspace too.  Unfortunately Fedora (sensibly) doesn't ship that anymore...

How reproducible is this?  What version of ostree is involved?  Is it only happening in F33 right now?

Comment 6 Peter Robinson 2020-10-15 10:24:57 UTC
> Ahh fun! The SIGSEGV and armhfp tells me this could be arch-specific
> (something int-size related issue maybe?). Hmm, this is going to be tricky
> to debug.

Yes, but we're not the first/only to use ostree for ARMv7, there's Endless and Foundries that support armv7 platforms

Comment 7 Colin Walters 2020-10-15 13:44:15 UTC
(To be clear I meant 32 bit *x86* doesn't make much sense IMO, 32 bit ARM is clearly here for years to come and I definitely want to support it)

Anyways I tried a few things, first a basic 32 bit x86 build seems OK without warnings, but we've had bugs in the past so I added it to our CI (Debian/Ubuntu still support 32 bit x86):
https://github.com/ostreedev/ostree/pull/2218

Second is looking at the file history...
https://github.com/ostreedev/ostree/commits/master/src/switchroot/ostree-prepare-root.c
Not much has changed here recently, the biggest is
https://github.com/ostreedev/ostree/commit/35642259175973617da937f3cab6ce5f13c95077#diff-e72f49c5f255b804b75fc887dd0490bff1d6d999ac6a9476dd23d3fb85a21260
And I stared at that diff for a bit and didn't see any problems.  A possible source of problems might be passing the wrong size value to getline() but
that looks right to me.

Comment 8 Paul Whalen 2020-10-15 15:55:24 UTC
(In reply to Jonathan Lebon from comment #4)
> Ahh fun! The SIGSEGV and armhfp tells me this could be arch-specific
> (something int-size related issue maybe?). Hmm, this is going to be tricky
> to debug.
> 
> Do you have access to the emergency console after? You should be able to try
> to run ostree-prepare-root manually yourself. You'll have to mount /sysroot
> first. So e.g.:
> 
> $ mount /dev/disk/by-label/root /sysroot
> $ /usr/lib/ostree/ostree-prepare-root /sysroot
> 
> If you can do that and the segfault reproduces, one thing you can try is to
> run it with strace at least so we can tell where roughly in the code to
> look. You can fetch strace from the host (if it's in the IoT package set),
> e.g.:
> 
> $ /sysroot/ostree/deploy/*/deploy/*.0/usr/bin/strace
> /usr/lib/ostree/ostree-prepare-root /sysroot

..
readlink("/sysroot/ostree/boot.0", "boot.0.1", 4095) = 8
lstat64("/sysroot/ostree/boot.0.1", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/sysroot/ostree/boot.0.1/fedora-iot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/sysroot/ostree/boot.0.1/fedora-iot/45ebb61204d0dfafdd5022960937239165a9feb31e6996254c80bce3f16681df", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/sysroot/ostree/boot.0.1/fedora-iot/45ebb61204d0dfafdd5022960937239165a9feb31e6996254c80bce3f16681df/0", {st_mode=S_IFLNK|0777, st_size=100, ...}) = 0
readlink("/sysroot/ostree/boot.0.1/fedora-iot/45ebb61204d0dfafdd5022960937239165a9feb31e6996254c80bce3f16681df/0", "../../../deploy/fedora-iot/deplo"..., 4095) = 100
lstat64("/sysroot/ostree/deploy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/sysroot/ostree/deploy/fedora-iot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/sysroot/ostree/deploy/fedora-iot/deploy", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/sysroot/ostree/deploy/fedora-iot/deploy/bc246420ec9cbfe5c0780b4285a86150e9cefa1385b5acf52dacc8ab7edb0b54.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/sysroot/ostree/deploy/fedora-iot/deploy/bc246420ec9cbfe5c0780b4285a86150e9cefa1385b5acf52dacc8ab7edb0b54.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xfc03} ---
[  811.506475] kauditd_printk_skb: 8 callbacks suppressed
[  811.506480] audit: type=1701 audit(1601424807.528:1109): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=3696 comm="ostree-prepare-" exe="/usr/lib/ostree/ostree-prepare-root" sig=11 res=1
+++ killed by SIGSEGV (core dumped) +++
[  811.538309] audit: type=1701 audit(1601424807.560:1110): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=3693 comm="strace" exe="/tmp/boot/strace" sig=11 res=1
Segmentation fault (core dumped)

Comment 9 Paul Whalen 2020-10-15 15:55:56 UTC
Created attachment 1721875 [details]
full strace output

Comment 10 Colin Walters 2020-10-15 16:11:44 UTC
Hmm.

Most useful here would be a core dump.  Use `ulimit -c unlimited` to start should work.

In order to get content out of the initramfs, you can try to bring up IP networking, or one trick I've used in the past is 9p mounts.

For example with coreos-assembler you can just:
cosa run --bind-rw .,/run/workdir
And then in the initramfs manually mount it; see this bit of code for an example
https://github.com/coreos/coreos-assembler/blob/6e4108c1a984513f9adbebb95eb5564b31ca86a5/mantle/platform/conf/conf.go#L1268

Comment 11 Jonathan Lebon 2020-10-15 16:20:10 UTC
OK noticed this at least: https://github.com/ostreedev/ostree/pull/2219

Though I'm not entirely convinced that's the root cause. Using %u instead of %llu would only cause print to read part of the integer, not more of it.

Comment 12 Fedora Update System 2020-10-15 20:23:08 UTC
FEDORA-2020-52d6060686 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-52d6060686

Comment 13 Fedora Blocker Bugs Application 2020-10-16 15:11:34 UTC
Proposed as a Freeze Exception for 33-final by Fedora user pbrobinson using the blocker tracking app because:

 FIxes and issue on ARMv7 IoT

Comment 14 Adam Williamson 2020-10-16 16:11:28 UTC
to be clear, is the bug here "it'll always completely fail to boot on 32-bit ARM"? Or is it more specific than that?

Comment 15 Peter Robinson 2020-10-16 20:10:38 UTC
(In reply to Adam Williamson from comment #14)
> to be clear, is the bug here "it'll always completely fail to boot on 32-bit
> ARM"? Or is it more specific than that?

No, nothing more specific, it fails to boot to login on ARMv7

Comment 16 Fedora Update System 2020-10-16 22:17:30 UTC
FEDORA-2020-52d6060686 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-52d6060686`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-52d6060686

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Adam Williamson 2020-10-19 15:47:10 UTC
We have +3 FE votes in https://pagure.io/fedora-qa/blocker-review/issue/180 , accepting.

Comment 18 Fedora Update System 2020-10-19 18:12:48 UTC
FEDORA-2020-52d6060686 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 19 Adam Williamson 2020-10-19 18:49:50 UTC
Update was not set as closing the bug; is it fixed now, can we close it? Or is more required?

Comment 20 Jonathan Lebon 2020-10-19 18:54:36 UTC
(In reply to Adam Williamson from comment #19)
> Update was not set as closing the bug; is it fixed now, can we close it? Or
> is more required?

That was a mistake on my part. Yes, this issue is closed now.

Not sure if CURRENTRELEASE or NEXTRELEASE is more appropriate. Going with the former since it's fixed in the same release this bug is filed against.

Comment 21 Adam Williamson 2020-10-19 19:19:07 UTC
buhbuhhhhhhh

the correct answer is "ERRATA". :)