Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1308771 - Current Rawhide Workstation live image does not reach GDM due to mislabelled /run/systemd/inhibit and /run/user/1000
Summary: Current Rawhide Workstation live image does not reach GDM due to mislabelled ...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 24
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
: 1309896 1309897 1309975 1310376 1310377 1310378 1310398 (view as bug list)
Depends On:
Blocks: F24AlphaBlocker 1314372
TreeView+ depends on / blocked
 
Reported: 2016-02-16 01:03 UTC by Adam Williamson
Modified: 2016-06-12 06:24 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1314372 (view as bug list)
Environment:
Last Closed: 2016-03-07 17:23:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
extract from journal on affected boot (268.95 KB, text/plain)
2016-02-16 01:18 UTC, Adam Williamson
no flags Details
ausearch output after enforcing=0 boot (49.49 KB, text/plain)
2016-02-16 01:25 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2016-02-16 01:03:36 UTC
Somewhere between 2016-02-07 and 2016-02-14, boot of Rawhide live images broke. 2016-02-07 was the last time it worked:

https://openqa.fedoraproject.org/tests/5011

between 2016-02-08 and 2016-02-13 no Workstation live image was successfully built for Rawhide. On 2016-02-14, the test failed:

https://openqa.fedoraproject.org/tests/5316

and it failed similarly on 2016-02-15. Booting manually in a VM, I also see the system fail to reach GDM. If I boot with 'rhgb quiet' removed from the boot parameters, the boot seems to proceed normally up to "Started GNOME Display Manager. ... Started Hostname Service. ... Started Virtualization daemon." but then I see:

[  OK  ] Stopped User Manager for UID 1000.
[  OK  ] Removed slice user-1000.slice.

A few other messages are interwoven for a bit, but then they dry up, and the system just sits there repeating those two messages:

[  OK  ] Stopped User Manager for UID 1000.
[  OK  ] Removed slice user-1000.slice.
[  OK  ] Stopped User Manager for UID 1000.
[  OK  ] Removed slice user-1000.slice.
[  OK  ] Stopped User Manager for UID 1000.
[  OK  ] Removed slice user-1000.slice.
[  OK  ] Stopped User Manager for UID 1000.
[  OK  ] Removed slice user-1000.slice.
[  OK  ] Stopped User Manager for UID 1000.
[  OK  ] Removed slice user-1000.slice.
[  OK  ] Stopped User Manager for UID 1000.
[  OK  ] Removed slice user-1000.slice.

It does that a couple dozen times, then a few more network messages show up (from the kernel, apparently):

device virbr0-nic left promiscuous mode
virbr0: port 1(virbr0-nic) entered disabled state
IPv6: ADDRCONF(NETDEV_UP): virbr0-nic: link is not ready

then things just stop, there are no further messages. I can't get to a login prompt on any tty (indeed tty switch doesn't appear to work).

Booting to runlevel 3 seems to work OK. Still, assigning to systemd for now as this seems to be maybe a user session management issue?

Proposing as an Alpha blocker: "All release-blocking images must boot in their supported configurations." https://fedoraproject.org/wiki/Fedora_24_Alpha_Release_Criteria#Release-blocking_images_must_boot

Comment 2 Adam Williamson 2016-02-16 01:05:05 UTC
note, there was a new systemd in Rawhide in the relevant period: systemd-229-1 landed on 2016-02-11.

Comment 3 Adam Williamson 2016-02-16 01:17:03 UTC
Aha. When I boot with systemd.log_level=debug , I can see that the problem seems to be that X crashes:

Received SIGCHLD from PID 1734 (Xorg).
Child 1734 (Xorg) died (code=exited, status=1/FAILURE)

furthermore, I managed to get to a tty, and looking at the journal, I see a whole bunch of errors including a ton of SELinux denials. Booting with 'enforcing=0' reaches a GDM screen (which is wrong, it should auto-login, but it at least boots).

So, we seem to have an SELinux issue. Re-assigning, and attaching a log extract.

Comment 4 Adam Williamson 2016-02-16 01:18:25 UTC
Created attachment 1127480 [details]
extract from journal on affected boot

as the attempt to start the session just keeps looping the log grows too big to get out easily, but here's an extract which I think covers a couple of iterations of the loop.

Comment 5 Adam Williamson 2016-02-16 01:25:31 UTC
Created attachment 1127481 [details]
ausearch output after enforcing=0 boot

After booting with enforcing=0, here's what I get with 'ausearch -m avc -ts recent' - several dozen denials.

Aha. It looks like /run/systemd/inhibit is mislabelled. All its contents seem to have label:

system_u:object_r:default_t

when they should have:

system_u:object_r:systemd_logind_inhibit_var_run_t:s0

/run/user/1000 also seems to have issues:

unconfined_u:object_r:default_t:s0->unconfined_u:object_r:config_home_t:s0

Comment 6 Adam Williamson 2016-02-16 01:45:01 UTC
Yup, I confirmed with the 2016-02-06 image - on that one, only /run/user/1000/keyring files seem to have labelling issues, nothing else in /run/user/1000 and nothing in /run/systemd is mislabelled.

I'm not sure what's changed in terms of how those files are created.

Comment 7 Lukas Vrabec 2016-02-16 12:43:15 UTC
Hi, 
I also can reproduce this issue. I would say systemd folks need to look on this, because systemd runs restorecon to fix labels in "/run". Maybe issue can be somewhere there.

Comment 8 Miroslav Grepl 2016-02-17 09:10:14 UTC
What does matchpathcon show you?

Can we confirm it is a systemd issue? Did you try to downgrade?

Comment 9 Adam Williamson 2016-02-17 09:15:45 UTC
You can't really 'downgrade' systemd in a live image. The information we have is:

1) it broke between 2016-02-06 and 2016-02-14
2) selinux hasn't changed noticeably in that time
3) systemd is responsible for labelling the affected paths
4) systemd had a major change in the relevant timeframe

If you look in https://bugzilla.redhat.com/attachment.cgi?id=1127481 there's output from 'restorecon -nvr', which does more or less what matchpathcon does (the -n option to restorecon tells it not to actually make the changes, -v tells it to print out what it *would* change, so it's effectively a way to check the labels for a given path).

Comment 10 Miroslav Grepl 2016-02-18 11:26:14 UTC
(In reply to awilliam from comment #9)
> You can't really 'downgrade' systemd in a live image. The information we
> have is:
> 
> 1) it broke between 2016-02-06 and 2016-02-14

Yeap, I meant a live image with an older version of systemd.

> 2) selinux hasn't changed noticeably in that time
> 3) systemd is responsible for labelling the affected paths
> 4) systemd had a major change in the relevant timeframe

Ok, it looks like a systemd issue here.

Thank you.

> 
> If you look in https://bugzilla.redhat.com/attachment.cgi?id=1127481 there's
> output from 'restorecon -nvr', which does more or less what matchpathcon
> does (the -n option to restorecon tells it not to actually make the changes,
> -v tells it to print out what it *would* change, so it's effectively a way
> to check the labels for a given path).

Comment 11 Miroslav Grepl 2016-02-19 10:36:02 UTC
*** Bug 1309975 has been marked as a duplicate of this bug. ***

Comment 12 Miroslav Grepl 2016-02-19 10:36:18 UTC
*** Bug 1309896 has been marked as a duplicate of this bug. ***

Comment 13 Miroslav Grepl 2016-02-19 10:36:34 UTC
*** Bug 1309897 has been marked as a duplicate of this bug. ***

Comment 14 Joachim Frieben 2016-02-20 20:49:13 UTC
Even using today's network install media, a freshly installed system does not boot into graphical mode. It turns out that the labels mentioned in comment 5 are wrong after the install but they are set correctly after forcing a full relabeling of the file system (touch /.autorelabel and reboot), see attachments

    https://bugzilla.redhat.com/attachment.cgi?id=1128860
    https://bugzilla.redhat.com/attachment.cgi?id=1128861

to bug 1309903.

Comment 15 Charles R. Anderson 2016-02-21 03:47:40 UTC
Description of problem:
Boot into Fedora-Live-Workstation-x86_64-rawhide-20160220.iso with enforcing=0.

Version-Release number of selected component:
selinux-policy-3.13.1-171.fc24.noarch

Additional info:
reporter:       libreport-2.6.4
hashmarkername: setroubleshoot
kernel:         4.5.0-0.rc4.git3.1.fc24.x86_64
type:           libreport

Comment 16 Charles R. Anderson 2016-02-21 03:58:10 UTC
*** Bug 1310377 has been marked as a duplicate of this bug. ***

Comment 17 Charles R. Anderson 2016-02-21 03:59:03 UTC
*** Bug 1310376 has been marked as a duplicate of this bug. ***

Comment 18 Charles R. Anderson 2016-02-21 04:00:08 UTC
*** Bug 1310378 has been marked as a duplicate of this bug. ***

Comment 19 Giulio 'juliuxpigface' 2016-02-21 10:41:18 UTC
*** Bug 1310398 has been marked as a duplicate of this bug. ***

Comment 20 satellitgo 2016-02-21 22:05:26 UTC
qemu/kvm - "enforcing=0" on boot gets to GUI on live; only "liveinst-T" works for installer. graphical boot fails on reboot with systemctl set-default graphical.target [1]

[1] https://fedoraproject.org/wiki/Test_Results:Fedora_24_Rawhide_20160220_Installation

Comment 21 Adam Williamson 2016-02-22 04:43:04 UTC
So I'm planning to do some manual bisection of this (by building systemd packages at various git commits and building live images with them included). So far I've confirmed that a live image built from current Rawhide with systemd returned to the state of 228-8.gite35a787 (and epoch-bumped) reaches a desktop and does not have the mislabellings - /run/systemd/inhibit is correctly labelled, and in /run/user/1000 only the keyring tree is mislabelled (as it was before this bug appeared).

I'll try and pin things down to a git commit tomorrow.

Comment 22 Joachim Frieben 2016-02-22 14:58:53 UTC
(In reply to Adam Williamson from comment #21)
It might be more economical to start from a current Fedora rawhide system installed in a virtual machine and to downgrade the systemd-related packages and reboot the system successively until the labeling is done correctly.
This issue, also reported in bug 1309903, is by no means restricted to the live image.

Comment 23 Charles R. Anderson 2016-02-22 15:21:44 UTC
I was able to get Fedora-Live-Workstation-x86_64-rawhide-20160220.iso installed by booting with:

enforcing=0 systemd.unit=multi-user.target

Log in on the text console as root, set a root password and liveuser password, and edit /etc/gdm/custom.conf to turn off AutoLogin.  Then:

systemctl isolate graphical.target

Log in graphical as root and run liveinst from there.  Installation went fine.  Boot the installed system using enforcing=0.

Comment 24 Charles R. Anderson 2016-02-22 16:11:01 UTC
The labeling problem doesn't happen with systemd-228-10.gite35a787.fc24 but does happen with systemd-229-1.fc24.

Comment 25 Adam Williamson 2016-02-22 16:39:22 UTC
Joachim: I can't actually reproduce that version. I run Rawhide on my desktop, and the labelling is correct for me.

Comment 26 Adam Williamson 2016-02-22 16:40:09 UTC
Charles: the issues beyond labelling are I think to do with GNOME and Wayland in the live session and are not related to this bug.

Comment 27 Petr Schindler 2016-02-22 19:23:37 UTC
Discussed at 2016-02-22 blocker review meeting: [1]. 

This bug was accepted as Alpha blocker: clear violation of "All release-blocking images must boot in their supported configurations."

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2016-02-22/f24-blocker-review.2016-02-22-17.00.html

Comment 28 Joachim Frieben 2016-02-23 08:03:52 UTC
(In reply to Adam Williamson from comment #26)
1. Current live media boot correctly into GNOME (on Wayland) on bare metal after adding kernel option "enforcing=0". The steps suggested in comment 23 are unnecessary.
2. Current live media boot correctly into GNOME (on Wayland) in a -virtual machine- with kernel option "enforcing=0" but heavy flickering related to a QXL DRM issue (qxl 0000:00:02.0: ... unpin not necessary) makes the the session unusable.

Comment 29 Adam Williamson 2016-02-23 15:48:12 UTC
I got sidetracked into other work but I'll try to get back to triaging it soon. So far I had found that 35ad41d361a2d9e766f2d7689b92cfbc4304ddbd - Jan 1st - is good, no mislabelling.

Comment 30 Adam Williamson 2016-02-24 00:29:55 UTC
Bisect news: the bug appears to be somewhere between:

https://github.com/systemd/systemd/commit/795ab08f783e78e85f1493879f13ac44cb113b00 (Feb 1)

and:

https://github.com/systemd/systemd/commit/ef9fde5378c0b2614991f9e3c4ac525cc07736a8 (Feb 7)

Comment 31 Adam Williamson 2016-02-24 00:32:51 UTC
Now we've reduced the range, this commit rather catches the eye:

https://github.com/systemd/systemd/commit/4b51966cf6c06250036e428608da92f8640beb96

I'm gonna check that one.

Comment 32 Adam Williamson 2016-02-24 03:14:05 UTC
Yep, that indeed turned out to be the culprit. A systemd built at git commit https://github.com/systemd/systemd/commit/d58669f08abefcc4300e1f476b6482e5f7e87098 - the one immediately before the selinux one - works OK. systemd built at https://github.com/systemd/systemd/commit/4b51966cf6c06250036e428608da92f8640beb96 fails.

I guess that change in how labelling gets done makes Arch work but breaks Fedora?

Comment 33 Lennart Poettering 2016-02-24 12:43:40 UTC
See my comments here:

https://github.com/systemd/systemd/pull/2508#issuecomment-188235477

Comment 34 Jan Kurik 2016-02-24 15:45:38 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 35 Petr Lautrbach 2016-03-02 19:31:18 UTC
FYI https://github.com/keszybz/systemd/commit/c3dacc8bbf2dc2f5d498072418289c3ba79160ac should fix this problem.

Comment 36 Zbigniew Jędrzejewski-Szmek 2016-03-02 19:35:11 UTC
OK, thanks for looking into this.

Can you comment on 
https://github.com/keszybz/systemd/commit/5c5433ad32c3d911f0c66cc124d190d40a2b5f5b
too?

Comment 37 Petr Lautrbach 2016-03-03 12:52:31 UTC
Commented, the change is right and wanted.

Comment 38 Lukas Vrabec 2016-03-03 15:36:53 UTC
I added fixes for this issue in selinux-policy rpm package (version selinux-policy-3.13.1-176.fc24). 
So /etc/selinux/targeted/contexts/files/file_contexts.bin will be available in Fedora Live images.

@Adam:
Could you try create new Live image with this new version of selinux-policy? 

http://koji.fedoraproject.org/koji/buildinfo?buildID=741436

Thank you.

Comment 39 Adam Williamson 2016-03-03 16:12:25 UTC
will do - I'd usually have tested the patch right away, but I'm buried in getting QA stuff synced up with the new compose process ATM :/ but i'll get it checked one way or another (a new compose should be along soon enough anyhow).

Comment 40 Lukas Vrabec 2016-03-03 20:22:17 UTC
Great! 

Thank you!

Comment 41 Joachim Frieben 2016-03-05 17:55:19 UTC
Fedora-Workstation-Live-x86_64-24-20160305.0.iso boots correctly into the GNOME (on Wayland) session in enforcing mode. Installed packages include selinux-policy-targeted-3.13.1-176.fc24.

Comment 42 Zbigniew Jędrzejewski-Szmek 2016-03-07 17:23:32 UTC
I guess we can close this for now. There's still some stuff to figure out in the systemd/selinux interface, but /run/systemd/inhibit and /run/user/1000 are fine (apart from /run/user/100/keyring, but that's a separate issue).


Note You need to log in before you can comment on or make changes to this bug.