Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1433459 - long boot times - selinux denials causes cloud-init to take a long time
Summary: long boot times - selinux denials causes cloud-init to take a long time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: fedora-release
Version: 26
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dennis Gilmore
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-17 16:47 UTC by Dusty Mabe
Modified: 2017-06-09 19:01 UTC (History)
10 users (show)

Fixed In Version: fedora-release-26-0.7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-09 19:01:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dusty Mabe 2017-03-17 16:47:58 UTC
Description of problem:

I'm seeing an issue on the fedora cloud base image but I assume that it affects other images. The issue is that the system takes a long time to boot because it is waiting on systemd-networkd-wait-online.service, but at least on the cloud image we don't use networkd, so that is an invalid thing to wait on:


Mar 17 13:34:22 localhost.localdomain systemd[1]: Network Service is not active.
Mar 17 13:34:22 localhost.localdomain systemd[1]: Dependency failed for Wait for Network to be Configured.
Mar 17 13:34:22 localhost.localdomain systemd[1]: systemd-networkd-wait-online.service: Job systemd-networkd-wait-online.service/start failed with result 'dependency'.

Some investigation with sgallagh showed this:

[root@cloudhost ~]# cd /usr/lib/systemd/system-preset/ 
[root@cloudhost system-preset]# grep systemd-networkd-wait-online ./*
./90-systemd.preset:enable systemd-networkd-wait-online.service


Version-Release number of selected component (if applicable):
https://kojipkgs.fedoraproject.org/compose/26/Fedora-26-20170316.0/compose/CloudImages/x86_64/images/Fedora-Cloud-Base-26_Alpha-1.1.x86_64.qcow2

systemd-233-2.fc26.x86_64
fedora-release-26-0.5.noarch

How reproducible:
always

Comment 2 Fedora Blocker Bugs Application 2017-03-17 20:39:29 UTC
Proposed as a Freeze Exception for 26-alpha by Fedora user sgallagh using the blocker tracking app because:

 This is a low-risk change that avoids an issue where systemd reports the system being in degraded mode because it is unable to start networkd on boot.

Comment 3 Dusty Mabe 2017-03-20 18:01:18 UTC
the above PRs have been merged. Is there an rpm built with these changes in it?

Comment 4 Stephen Gallagher 2017-03-21 11:55:35 UTC
(In reply to Dusty Mabe from comment #3)
> the above PRs have been merged. Is there an rpm built with these changes in
> it?

Not yet. We need someone from rel-eng to create it.

Comment 5 Adam Williamson 2017-03-22 19:15:09 UTC
+1 FE.

Comment 6 Jon Disnard 2017-03-22 19:15:18 UTC
+1 FE

Comment 7 Nick Bebout 2017-03-22 19:16:11 UTC
+1 FE

Comment 8 Fedora Update System 2017-03-22 19:28:04 UTC
fedora-release-26-0.6 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-90bcb067bf

Comment 9 Adam Williamson 2017-03-22 19:36:13 UTC
That's +3/+4 FE, marking accepted.

Comment 10 Fedora Update System 2017-03-23 14:25:24 UTC
fedora-release-26-0.6 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-90bcb067bf

Comment 11 Adam Williamson 2017-03-24 01:02:14 UTC
So, I looked into this a bit before just blindly karma'ing the update, and it's a bit strange.

I tested a Workstation live image, and it doesn't have this problem. The boot is not held up by this service, and this service doesn't actually start.

So I thought maybe it's disabled, but it isn't:

[root@localhost-live test]# systemctl is-enabled systemd-networkd-wait-online.service
enabled

yet it shows as inactive:

[root@localhost-live test]# systemctl status systemd-networkd-wait-online.servie 
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/usr/lib/systemd/system/systemd-networkd-wait-online.service;
   Active: inactive (dead)
     Docs: man:systemd-networkd-wait-online.service(8)

so I looked at the unit file, and it has this:

Requisite=systemd-networkd.service

What does "Requisite" mean? Well, from the systemd.unit man page:

       Requisite=
           Similar to Requires=. However, if the units listed here are not
           started already, they will not be started and the transaction will
           fail immediately.

i.e., if I'm reading it correctly, this basically means "if systemd-networkd isn't active, don't run this service".

So I'm actually not convinced by this change, without further investigation. It seems to me that leaving this service enabled by default *should* be the right thing to do, as it should mean that if someone decides to enable systemd-networkd.service after install, the system will properly wait for the network to come up before boot proceeds. If we disable this service by default, then people who want to use systemd-networkd will need to know or figure out that they should *also* enable this service, and I don't think that's going to be obvious at all, and it doesn't seem like the right thing to do to require this of them.

Zbigniew, do you have any thoughts here?

Can someone take another look at this and see if they can figure out why this service is actually active on the cloud images at all? Is systemd-networkd.service actually starting up, for some reason, on those images?

Comment 12 Adam Williamson 2017-03-24 01:18:35 UTC
Bit more confusing data here...I tried enabling systemd-networkd.service on my WS install and rebooting. Noted:

1) Boot takes longer
2) It doesn't actually bring the network up (even though this is a VM so I'd expect it to be able to just automatically bring up the only interface via DHCP)
3) systemd-networkd-wait-online.service *still* shows as inactive and has no log messages, though systemd-networkd.service shows as active

Comment 13 Kevin Fenzi 2017-03-24 03:09:14 UTC
Well, note that the cloud image uses the old 'network' service. This won't matter with NetworkManager and nothing uses systemd-networkd out of the box. 

It seems to fix the bug here. 

1. boot f26 alpha rc2 cloud image: 
systemd-analyze: 
Startup finished in 631ms (kernel) + 1.220s (initrd) + 3min 50.711s (userspace) = 3min 52.564s


2. systemctl disable systemd-networkd-wait-online
3. reboot: 

$ systemd-analyze 
Startup finished in 645ms (kernel) + 1.150s (initrd) + 15.166s (userspace) = 16.963s

Comment 14 Dusty Mabe 2017-03-24 04:37:17 UTC
Kevin, the test that you did doesn't mean anything because on 2nd boot even without disabling that service the system boots up fast.


I agree with Adam. When I was first looking into this I couldn't quite figure it out and then stephen seemed like he knew the answer so I stopped looking into it. Now i'm back.

Another thing to note is that we see this on the Atomic Host qcow images we produce nightly, which use NetworkManager. So we see it on cloud base images, which use legacy network scripts, and Atomic Host, which uses NetworkManager.

So why don't we see it on other images that use NM? What's common between the images we do see this problem on? --> cloud-init. and...

[root@cloudhost ~]# systemd-analyze blame | head -n 3
    3min 22.161s cloud-init.service
          5.627s network.service
          2.310s cloud-init-local.service

Then I started to look at the journal from around this time. There are quite a few selinux denials (reported in bug https://bugzilla.redhat.com/show_bug.cgi?id=1435310). 

It turns out if I set enforcing=0 on the kernel command line then the system boots up fine.

So, this is not a bug with systemd-networkd-wait-online.service. It is actually a symptom of selinux denials which cause cloud-init to take a long time to start. The solution to BZ1435310 should resolve this issue.

Comment 15 Stephen Gallagher 2017-03-24 12:56:57 UTC
(In reply to Adam Williamson from comment #11)
> What does "Requisite" mean? Well, from the systemd.unit man page:
> 
>        Requisite=
>            Similar to Requires=. However, if the units listed here are not
>            started already, they will not be started and the transaction will
>            fail immediately.
> 
> i.e., if I'm reading it correctly, this basically means "if systemd-networkd
> isn't active, don't run this service".

This is accurate, but incomplete. What it *also* means is that this service will be marked as *failed* if systemd-networkd.service isn't running. In other words, it's asserting that it is an invalid configuration to try to start systemd-networkd-wait-online.service if systemd-networkd.service isn't enabled.

If you run `systemctl status` (with no arguments), you will note that it is reporting: "State: degraded". So if nothing else, this patch resolves that misconfiguration.

That being said, I see your point with regards to making sure the systemd-networkd-wait-online.service should always be started with the systemd-networkd.service, so I think the correct solution there would be to add "Also=systemd-networkd-wait-online.service" to the systemd-networkd.service [Install] selection (and the reverse on the other side).

Zbigniew: what do you think about that?

Comment 16 Stephen Gallagher 2017-03-24 12:59:08 UTC
Also, given that we have a different bug already to track the SELinux denial issue, can we leave this bug tracking the "State: Degraded" issue?

Comment 17 Dusty Mabe 2017-03-24 13:26:15 UTC
I'll let adamw decide that. The problem I was originally reporting turned out to have a different root cause than what we thought, so I changed the title accordingly.

Comment 18 Dusty Mabe 2017-03-24 13:58:29 UTC
I'll also note that I don't see a degraded state on an install from a live workstation ISO:

```
[root@localhost-live ~]# systemctl status systemd-networkd-wait-online.service 
● systemd-networkd-wait-online.service - Wait for Network to be Configured
   Loaded: loaded (/usr/lib/systemd/system/systemd-networkd-wait-online.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:systemd-networkd-wait-online.service(8)
[root@localhost-live ~]# journalctl | grep wait
Mar 24 09:44:03 localhost-live audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-quit-wait comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 24 09:44:03 localhost-live audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-quit-wait comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
```

Comment 19 Adam Williamson 2017-03-24 15:48:59 UTC
I'm fine with having this bug track the 'degraded' issue, but please come up with a precise description of the scenario, because as Dusty and I found, I don't think you have that yet. It doesn't seem to be the case that 'any system with systemd-networkd-wait-online.service enabled and systemd-networkd.service disabled winds up in degraded state'.

I'll transfer the accepted FE status to the SELinux bug, since we voted on the symptom that turned out to be caused by that bug. This could be proposed again on its merits after the description is corrected.

Comment 20 Zbigniew Jędrzejewski-Szmek 2017-03-25 04:09:50 UTC
(In reply to Stephen Gallagher from comment #15)
> (In reply to Adam Williamson from comment #11)
> > What does "Requisite" mean? Well, from the systemd.unit man page:
> > 
> >        Requisite=
> >            Similar to Requires=. However, if the units listed here are not
> >            started already, they will not be started and the transaction will
> >            fail immediately.
> > 
> > i.e., if I'm reading it correctly, this basically means "if systemd-networkd
> > isn't active, don't run this service".
> 
> This is accurate, but incomplete. What it *also* means is that this service
> will be marked as *failed* if systemd-networkd.service isn't running.
Yes.

> If you run `systemctl status` (with no arguments), you will note that it is
> reporting: "State: degraded". So if nothing else, this patch resolves that
> misconfiguration.
Yes.

Note that systemd-networkd-wait-online.service has WantedBy=network-online.target
(not m-u.target), so it's not activated unless network-online.target
itself is activated. So the issue with systemd-networkd-wait-online.service
timing out requires systemd-networkd.service to be disabled *and*
systemd-networkd-wait-online.service to be enabled *and*
network-online.target to be enabled or pulled in by something.

> That being said, I see your point with regards to making sure the
> systemd-networkd-wait-online.service should always be started with the
> systemd-networkd.service, so I think the correct solution there would be to
> add "Also=systemd-networkd-wait-online.service" to the
> systemd-networkd.service [Install] selection (and the reverse on the other
> side).
Yeah, that makes a lot of sense → https://github.com/systemd/systemd/pull/5635.

Nevertheless, I think the PR for Fedora presets still makes sense, even if PR5635
goes in.

(In reply to Adam Williamson from comment #12)
> Bit more confusing data here...I tried enabling systemd-networkd.service on
> my WS install and rebooting. Noted:
>
> 2) It doesn't actually bring the network up (even though this is a VM so I'd
> expect it to be able to just automatically bring up the only interface via
> DHCP)
No default configuration is provided. So enabling systmed-networkd by itself
does not do anything (except for containers, where some default "safe" config
is provided for veths).

> 3) systemd-networkd-wait-online.service *still* shows as inactive and has no
> log messages, though systemd-networkd.service shows as active
What about network-online.target? If that is not activated,
systemd-networkd-wait-online.service is not started either.

Comment 21 Dusty Mabe 2017-04-13 02:41:03 UTC
what's the resolution around this?

Comment 22 Stephen Gallagher 2017-04-24 12:53:38 UTC
(In reply to Dusty Mabe from comment #21)
> what's the resolution around this?

We pushed the change in the default presets, but it was unclear if this was in fact the root cause of your issue. Are you still experiencing the problem on recent branched composes? If so, we may need to investigate other possible causes.

Comment 23 Dusty Mabe 2017-04-24 13:25:35 UTC
we are not having the "long boot times" issue any longer. This was fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1435310.

We do still see the `Job systemd-networkd-wait-online.service/start failed with result 'dependency'` issue, though, which turned out to not be the root cause of our problem, but is still a (mostly cosmetic) failure during boot.

Comment 24 Stephen Gallagher 2017-04-24 15:45:11 UTC
(In reply to Dusty Mabe from comment #23)
> we are not having the "long boot times" issue any longer. This was fixed in
> https://bugzilla.redhat.com/show_bug.cgi?id=1435310.
> 
> We do still see the `Job systemd-networkd-wait-online.service/start failed
> with result 'dependency'` issue, though, which turned out to not be the root
> cause of our problem, but is still a (mostly cosmetic) failure during boot.

That's because it seems that there hasn't been a rebuilt `fedora-release` package yet. That should happen soon as there are several other changes queued up in there.

Comment 25 Fedora Update System 2017-05-18 01:20:18 UTC
fedora-release-26-0.7 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-6b842d9c19

Comment 26 Fedora Update System 2017-05-18 19:04:22 UTC
fedora-release-26-0.7 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-6b842d9c19

Comment 27 Fedora Update System 2017-06-09 19:01:09 UTC
fedora-release-26-0.7 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.