Note: This is a public test instance of Red Hat Bugzilla. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback at bugzilla.redhat.com.
Bug 1424940 - [systemd] rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Summary: [systemd] rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed ...
Keywords:
Status: CLOSED DUPLICATE of bug 1401561
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 25
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: JianHong Yin
URL:
Whiteboard:
Depends On:
Blocks: 1422947
TreeView+ depends on / blocked
 
Reported: 2017-02-20 06:33 UTC by JianHong Yin
Modified: 2017-04-07 15:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-07 15:26:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description JianHong Yin 2017-02-20 06:33:12 UTC
Description of problem:

systemctl start nfs hang up long time and fail.

by debug, it always hang up on "systemd-tty-ask-password-agent --watch"
12484 ?        S      0:00  |   \_ sshd: root@pts/0
12493 pts/0    Ss     0:00  |       \_ -bash
13709 pts/0    S+     0:00  |           \_ systemctl start nfs
13710 pts/0    S+     0:00  |               \_ /usr/bin/systemd-tty-ask-password-agent --watch

# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 01:23:36 EST; 1min 58s ago
  Process: 13657 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 13656 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 13657 (code=exited, status=1/FAILURE)

Feb 20 01:14:50 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.


Version-Release number of selected component (if applicable):
[root@bkr-hv03-guest40 ~]# lsb_release -sir
Fedora 25
[root@bkr-hv03-guest40 ~]# rpm -q nfs-utils
nfs-utils-2.1.1-1.fc25.x86_64

How reproducible:
always

Steps to Reproduce:
1. mkdir /home/test
2. echo "/home/test *(rw,no_root_squash)" >/etc/exports
3. systemctl stop nfs && systemctl start nfs

Actual results:

systemctl start nfs hang up long time and fail.

[root@bkr-hv03-guest40 nfstest]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 01:23:36 EST; 1min 58s ago
  Process: 13657 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 13656 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 13657 (code=exited, status=1/FAILURE)

Feb 20 01:14:50 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13657]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 01:23:36 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.


Expected results:
works fine

Additional info:

Comment 1 JianHong Yin 2017-02-20 06:50:46 UTC
How reproducible: always

'''
[root@bkr-hv03-guest40 ~]# echo "/usr/share *(ro)" >/etc/exports
[root@bkr-hv03-guest40 ~]# systemctl stop nfs
[root@bkr-hv03-guest40 ~]# time systemctl start nfs
Job for nfs-server.service failed because the control process exited with error code.
See "systemctl status nfs-server.service" and "journalctl -xe" for details.

real	8m47.408s
user	0m0.003s
sys	0m0.005s
[root@bkr-hv03-guest40 ~]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 01:46:04 EST; 2min 23s ago
  Process: 13858 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 13857 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 13858 (code=exited, status=1/FAILURE)

Feb 20 01:37:17 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13858]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[13858]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 01:46:04 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.
'''

Comment 3 Scott Mayhew 2017-02-20 17:07:30 UTC
I think the problem may be with the latest rpcbind update.

What happens if you run this before step 3 in your reproducer?:

systemd-tmpfiles --create

Comment 4 Scott Mayhew 2017-02-20 17:28:51 UTC
Or better yet, try this:

systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf

I think that probably needs to be added to the postinstall scriptlet of rpcbind.

Comment 5 JianHong Yin 2017-02-21 02:51:57 UTC
(In reply to Scott Mayhew from comment #4)
> Or better yet, try this:
> 
> systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
> 
> I think that probably needs to be added to the postinstall scriptlet of
> rpcbind.

seems systemd-tmpfiles can not solve the problem:

'''
[root@bkr-hv03-guest40 ~]# cat /etc/exports
/usr/share *(ro)
[root@bkr-hv03-guest40 ~]# systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
[root@bkr-hv03-guest40 ~]# time systemctl start  nfs
Job for nfs-server.service failed because the control process exited with error code.
See "systemctl status nfs-server.service" and "journalctl -xe" for details.

real	8m47.313s
user	0m0.004s
sys	0m0.005s
[root@bkr-hv03-guest40 ~]# time systemctl status  nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 21:48:47 EST; 1min 7s ago
  Process: 15662 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 15661 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 15662 (code=exited, status=1/FAILURE)

Feb 20 21:40:00 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Starting NFS server and services...
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[15662]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com rpc.nfsd[15662]: rpc.nfsd: unable to set any sockets for nfsd
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: Failed to start NFS server and services.
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Unit entered failed state.
Feb 20 21:48:47 bkr-hv03-guest40.dsal.lab.eng.bos.redhat.com systemd[1]: nfs-server.service: Failed with result 'exit-code'.

real	0m0.009s
user	0m0.005s
sys	0m0.002s
'''

'''
15564 pts/1    Ss     0:00  |       \_ -bash
15651 pts/1    S+     0:00  |           \_ systemctl start nfs
15652 pts/1    S+     0:00  |               \_ /usr/bin/systemd-tty-ask-password
'''

Comment 6 Steve Dickson 2017-02-21 13:46:43 UTC
(In reply to Scott Mayhew from comment #4)
> Or better yet, try this:
> 
> systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
> 
> I think that probably needs to be added to the postinstall scriptlet of
> rpcbind.

I think this commit took care of the problem you are talking about:

commit 0df1187eb1cde0df1a2cd1ffb09ab7782a9b5471
Author: Steve Dickson <steved>
Date:   Sat Jan 28 15:46:33 2017 -0500

    Corrected boot dependency in systemd files (bz 1401561)
    
    Signed-off-by: Steve Dickson <steved>

Comment 7 Steve Dickson 2017-02-21 13:48:42 UTC
(In reply to Yin.JianHong from comment #5)
> '''
> 15564 pts/1    Ss     0:00  |       \_ -bash
> 15651 pts/1    S+     0:00  |           \_ systemctl start nfs
> 15652 pts/1    S+     0:00  |               \_
> /usr/bin/systemd-tty-ask-password
> '''
The question I have is why is systemd-tty-ask-password being called?? 
I'm thinking that's where the time is being spent...

Comment 8 Scott Mayhew 2017-02-21 13:56:14 UTC
(In reply to Steve Dickson from comment #6)
> (In reply to Scott Mayhew from comment #4)
> > Or better yet, try this:
> > 
> > systemd-tmpfiles --create /usr/lib/tmpfiles.d/rpcbind.conf
> > 
> > I think that probably needs to be added to the postinstall scriptlet of
> > rpcbind.
> 
> I think this commit took care of the problem you are talking about:
> 
> commit 0df1187eb1cde0df1a2cd1ffb09ab7782a9b5471
> Author: Steve Dickson <steved>
> Date:   Sat Jan 28 15:46:33 2017 -0500
> 
>     Corrected boot dependency in systemd files (bz 1401561)
>     
>     Signed-off-by: Steve Dickson <steved>

I see... I can reproduce the problem easily though, and in my case the problem is that nfsd can't register with rpcbind because rpcbind's not running, and rpcbind's not running because /run/rpcbind doesn't exist and it can't create the lock file.

[root@localhost ~]# systemctl status nfs
● nfs-server.service - NFS server and services
   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/generator/nfs-server.service.d
           └─order-with-mounts.conf
   Active: failed (Result: exit-code) since Tue 2017-02-21 08:38:49 EST; 15s ago
  Process: 28004 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
  Process: 28002 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
  Process: 27998 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS)
  Process: 28035 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=1/FAILURE)
  Process: 28034 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
 Main PID: 28035 (code=exited, status=1/FAILURE)

Feb 20 11:36:38 localhost.localdomain systemd[1]: Starting NFS server and services...
Feb 21 08:38:49 localhost.localdomain rpc.nfsd[28035]: rpc.nfsd: writing fd to kernel failed: errno 110 (Connection timed out)
Feb 21 08:38:49 localhost.localdomain rpc.nfsd[28035]: rpc.nfsd: unable to set any sockets for nfsd
Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Main process exited, code=exited, status=1/FAILURE
Feb 21 08:38:49 localhost.localdomain systemd[1]: Failed to start NFS server and services.
Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Unit entered failed state.
Feb 21 08:38:49 localhost.localdomain systemd[1]: nfs-server.service: Failed with result 'exit-code'.

[root@localhost ~]# systemctl status rpcbind
● rpcbind.service - RPC Bind
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2017-02-20 11:24:50 EST; 21h ago
     Docs: man:rpcbind(8)
 Main PID: 2879 (code=exited, status=1/FAILURE)

Feb 20 11:24:50 localhost.localdomain systemd[1]: Starting RPC Bind...
Feb 20 11:24:50 localhost.localdomain rpcbind[2879]: rpcbind: /run/rpcbind/rpcbind.lock: No such file or directory
Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Main process exited, code=exited, status=1/FAILURE
Feb 20 11:24:50 localhost.localdomain systemd[1]: Failed to start RPC Bind.
Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Unit entered failed state.
Feb 20 11:24:50 localhost.localdomain systemd[1]: rpcbind.service: Failed with result 'exit-code'.

Comment 9 Scott Mayhew 2017-02-21 13:59:00 UTC
(In reply to Steve Dickson from comment #7)
> (In reply to Yin.JianHong from comment #5)
> > '''
> > 15564 pts/1    Ss     0:00  |       \_ -bash
> > 15651 pts/1    S+     0:00  |           \_ systemctl start nfs
> > 15652 pts/1    S+     0:00  |               \_
> > /usr/bin/systemd-tty-ask-password
> > '''
> The question I have is why is systemd-tty-ask-password being called?? 
> I'm thinking that's where the time is being spent...

Just a thought... since rpcbind runs as the rpc user instead of root, maybe that's preventing it from using systemd-tmpfile-setup?

Comment 10 Jeff Layton 2017-02-21 14:03:15 UTC
As a side note...

Maybe we should consider having rpc.nfsd print a more helpful error message when it sees -ETIMEDOUT here? That's almost always occurring because it can't talk to rpcbind.

It might also not hurt to consider allowing knfsd to start up when rpcbind registration fails, but throw a printk to warn about missing registration.

Comment 11 Steve Dickson 2017-03-24 18:58:36 UTC
I think I finally fix this problem.... Please see
   https://bugzilla.redhat.com/show_bug.cgi?id=1401561

Comment 12 Steve Dickson 2017-04-07 15:26:19 UTC

*** This bug has been marked as a duplicate of bug 1401561 ***


Note You need to log in before you can comment on or make changes to this bug.