Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.

Bug 1093803

Summary: dhclient fails to renew IP address lease after system time changes
Product: Red Hat Enterprise Linux 7 Reporter: Jiri Jaburek <jjaburek>
Component: dhcpAssignee: Pavel Zhukov <pzhukov>
Status: CLOSED ERRATA QA Contact: Ondrej Mejzlik <omejzlik>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0CC: awilliam, devurandom, egasiorowski, freaky, hartsjc, ipilcher, jburke, jeharris, jpopelka, jstancek, linuxgcc, matthew.dowdell, mbliss, michaelv, omejzlik, osabart, pemensik, psppsn96, ptalbert, pzhukov, rhsu5, sbrivio, sferguso, sukulkar, thaller, thozza, xingli, zguo
Target Milestone: rcKeywords: Reproducer, TestCaseApproved
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: dhcp-4.2.5-78.el7, bind-9.11.4-11.P2.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-31 19:57:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1095800, 1380362, 1393869, 1534569, 1709724, 1716960    
Attachments:
Description Flags
reproducer, extract from the audit-test suite, utils.plib none

Description Jiri Jaburek 2014-05-02 17:26:54 UTC
Created attachment 891977 [details]
reproducer, extract from the audit-test suite, utils.plib

Description of problem:

It seems that dhclient is affected by system "wall clock" time changes and can - in some (but 100% reproducible) cases - disconnect the interface completely (remove its address).

Version-Release number of selected component (if applicable):
dhclient-4.2.5-27.el7

How reproducible:
always


Steps to Reproduce:
1. set up an isolated private network with dhcp server using ie. 1 minute lease time
2. on a separate system (machine) that's connected to the dhcp server network, stop NetworkManager and the network service, kill any possible dhclient instances
3. bring the target interface up manually, start dhclient manually
4. verify (using tcpdump) that dhclient refreshes the address every ~1 minute
5. between dhcp requests, advance the system time by an amount larger than the lease time, ie. 2 days
6. observe dhclient sending two requests, but working like normal after that, with ~1 minute pauses
7. roll back the system time by 2 days
8. observe no further dhcp requests, notice ipv4 address removal after ~2 minutes, disconnecting any active ssh sessions
9. while connected to the machine via a physical TTY, note the dhclient process still running, with no special log messages, the last one being about future renewal


Actual results:
dhclient is affected by system time changes

Expected results:
dhclient can operate independently on system time
(or at least survive large jumps)

Additional info:
This issue was originally found and described in bug 1034737 as a possible NetworkManager problem. Since then, NM was altered to use CLOCK_BOOTTIME, but the issue remained, so I tried it without NM and was still able to reproduce it, using the steps described above.

The steps are a simplification of several test cases done by the audit-test suite, which we use for Common Criteria Certification testing. An example reproducer, extracted from the suite, is attached.

Comment 2 Thomas Haller 2014-05-02 18:34:23 UTC
(In reply to Jiri Jaburek from comment #0)
> Created attachment 891977 [details]
> reproducer, extract from the audit-test suite, utils.plib
> > 
> Actual results:
> dhclient is affected by system time changes
> 
> Expected results:
> dhclient can operate independently on system time
> (or at least survive large jumps)
> 
> Additional info:
> This issue was originally found and described in bug 1034737 as a possible
> NetworkManager problem. Since then, NM was altered to use CLOCK_BOOTTIME,
> but the issue remained, so I tried it without NM and was still able to
> reproduce it, using the steps described above.
> 
> The steps are a simplification of several test cases done by the audit-test
> suite, which we use for Common Criteria Certification testing. An example
> reproducer, extracted from the suite, is attached.


FTR: Regarding NetworkManager, ...

NM expects dhclient to report back in time with a lease update. In bug 1034737 it can be seen that dhclient does indeed not report back to extend the address lifetime, so the address got removed by the kernel.

NM itself does not watchdog dhclient regarding the timeout. I think that is correct, because dhclient not extending the address lifetime is not an error from NM's point of view.

Comment 3 Jiri Popelka 2014-05-05 11:21:52 UTC
Thanks, yes, I know about this issue (bug #916116, comment #2).
I never got into fixing it as it seemed like a invasive change to me (the affected code is common for dhclient and dhcpd). We should fix it, sure, but I'm not sure about RHEL-7 as we don't have enough sources (dhcp/dhclient is from historical reasons "sanity"-tested by RTT team) to make sure it doesn't break anything else.

Comment 4 Jiri Jaburek 2014-05-09 15:39:58 UTC
(In reply to Jiri Popelka from comment #3)
> Thanks, yes, I know about this issue (bug #916116, comment #2).
> I never got into fixing it as it seemed like a invasive change to me (the
> affected code is common for dhclient and dhcpd). We should fix it, sure, but
> I'm not sure about RHEL-7 as we don't have enough sources (dhcp/dhclient is
> from historical reasons "sanity"-tested by RTT team) to make sure it doesn't
> break anything else.

Regarding my use case, is there some (preferably easily scriptable) way to tell dhclient to stop waiting and send a new lease request? The dhclient manpage mentions "OMAPI" and a omshell(1) binary, which is unfortunately available within the dhcp (server) package, not dhclient, and which seems to be somewhat unfinished / hardly scriptable without expect (tcl).
The manpage also mentions "THE CONTROL OBJECT", not going into *what* it actually is (file? socket? where?), which could perhaps be used as well, pausing and resuming dhclient right away, which would break some connections, though.

In ideal world, dhclient would respond to SIGHUP / SIGUSR1 / SIGUSR2 by restarting the negotiation sequence (requesting a new lease).

Thanks,
Jiri

Comment 5 Jiri Popelka 2014-05-09 15:48:09 UTC
(In reply to Jiri Jaburek from comment #4)
> is there some (preferably easily scriptable) way to
> tell dhclient to stop waiting and send a new lease request?

I'm not aware of any.

> The dhclient manpage mentions "OMAPI" and a omshell(1) binary

I've never tried to use omshell on dhclient, only dhcpd.

Comment 9 James Hartsock 2014-10-08 20:18:45 UTC
*** Bug 1148159 has been marked as a duplicate of this bug. ***

Comment 13 Jiri Popelka 2015-03-12 14:43:41 UTC
I have bad news.

Using monotonic time (CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME) instead of gettimeofday() in dhclient/dhcpd wouldn't be a problem.

However the message dispatching code uses [1] timer mechanism from bind's libisc library and that uses [2] gettimeofday() too.

So far I have no idea how to fix it without rewriting bind's internals.
Actually there's a possibility to use our own timer instead of the one from libisc, but that'd probably mean reverting [3], which is a step back and I have no idea what might break with that.

[1] https://source.isc.org/cgi-bin/gitweb.cgi?p=dhcp.git;a=blob;f=common/dispatch.c;hb=HEAD#l354
[2] https://source.isc.org/cgi-bin/gitweb.cgi?p=bind9.git;a=blob;f=lib/isc/unix/stdtime.c#l76
[3] https://source.isc.org/cgi-bin/gitweb.cgi?p=dhcp.git;a=commitdiff;h=98bf16077d22f28e288a18e184a9d1f97cb5f4f7

Comment 18 Adam Williamson 2015-10-29 04:28:30 UTC
I suspect we've (Mike Ruckman and myself) just rediscovered this in Fedora 23 validation testing. The scenario we found is with installs to a VM using libvirt networking.

libvirt issues fairly short leases - I think they're valid for an hour. So we've seen this happen on first boot after install:

1. system clock is wrong at first - it's a little over an hour fast. dhclient runs, obviously, while the system clock is still wrong: it's showing 21:58 when the real time is 20:46.

2. dhclient plans to renew the lease as it usually does, just before it's half-expired: the logs show a message "renewal in 1478 seconds". That would be approx 22:22.

3. chrony kicks in right after the network comes up, and adjusts the system clock back to the correct time - 20:46.

So now if you run the numbers we have a lease that will expire at 21:46, but dhclient isn't planning to try and renew it until 22:22. And indeed at 21:46 the system's network connection disappears.

It seems that when dhclient *does* kick in and try to renew the lease, it fails; I don't know if that's a separate bug, or just a symptom of trying to renew a lease too late. So the upshot is that on the first boot after install, if the system clock is fast and the router issues fairly short leases (of course, how short the leases have to be depends on how inaccurate the system clock is), you'll wind up with a network connection that drops for good some time after you boot, until you reboot or manually reset the connection somehow.

Comment 19 Jiri Popelka 2016-04-05 09:50:53 UTC
*** Bug 1323971 has been marked as a duplicate of this bug. ***

Comment 30 Pavel Zhukov 2017-04-18 10:03:32 UTC
*** Bug 1361934 has been marked as a duplicate of this bug. ***

Comment 31 friend 2017-04-19 12:09:25 UTC
where is the fixed link

Comment 32 Honggang LI 2017-08-02 13:42:33 UTC
*** Bug 1446115 has been marked as a duplicate of this bug. ***

Comment 33 sushil kulkarni 2017-08-02 18:25:12 UTC
Removing Networking RPL.. should possibly be added to core services RPL

Comment 34 Pavel Zhukov 2017-08-28 08:43:59 UTC
*** Bug 1485047 has been marked as a duplicate of this bug. ***

Comment 36 Ferry 2017-12-21 10:21:10 UTC
Bug at ISC: https://bugs.isc.org/Public/Bug/Display.html?id=45540

Comment 43 Ed Gasiorowski 2018-05-15 22:07:11 UTC
Please use the new AMI Aptio firmware release and retest
Slimpro: 3.06.25
Aptio: Label-026

Comment 46 Freddy Wissing 2019-06-14 19:18:37 UTC
Joining a case that my customer has to the bug here. 

Their circumstance is close enough to the core issue here that it makes sense to include it.  

1. A VMWare image is migrated to AWS and started.

2. Predictably around 1 hour, the instance loses network connectivity, and is subsequently restarted by a cloudwatch job.

  (AWS lease time for dhcp is 3600sec, or 1 hour)

3. Upon review, the RTC is set to localtime, and not UTC as it should be. 

When they set it to UTC, the issue vanishes. Upon reverting the RTC back to localtime, the issue reproduces predictably.

Comment 69 errata-xmlrpc 2020-03-31 19:57:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1087