4

On of our servers -- running CentOS 6 x86_64 -- we're seeing a lot unusual activity with rpc.statd. We have rpc.statd configured to run on a static port via /etc/sysconfig/nfs:

MOUNTD_PORT=892
STATD_PORT=662
QUOTAD_PORT=875

And this does result in rpc.statd running and listening on this port as expected:

# ps -fe | grep rpc.statd | grep 662
rpcuser  23129     1  0 Apr30 ?        00:00:00 rpc.statd -p 662

What's odd is that on this system, there are also numerous other rpc.statd instances running with the --no-notify flag:

rpcuser    808     1  0 02:23 ?        00:00:00 rpc.statd --no-notify
rpcuser   2052     1  0 07:17 ?        00:00:00 rpc.statd --no-notify
rpcuser   3558     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser   5787     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser   6499     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser   8834     1  0 03:21 ?        00:00:00 rpc.statd --no-notify
rpcuser   9661     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser  13702     1  0 00:08 ?        00:00:00 rpc.statd --no-notify
rpcuser  14813     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser  15375     1  0 08:39 ?        00:00:00 rpc.statd --no-notify
rpcuser  15376     1  0 04:26 ?        00:00:00 rpc.statd --no-notify
rpcuser  19782     1  0 09:36 ?        00:00:00 rpc.statd --no-notify
rpcuser  20491     1  0 05:36 ?        00:00:00 rpc.statd --no-notify
rpcuser  23136     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser  23320     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser  26145     1  0 10:10 ?        00:00:00 rpc.statd --no-notify
rpcuser  26480     1  0 06:24 ?        00:00:00 rpc.statd --no-notify
rpcuser  26598     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify
rpcuser  26821     1  0 01:15 ?        00:00:00 rpc.statd --no-notify
rpcuser  28255     1  0 Apr30 ?        00:00:00 rpc.statd --no-notify

Also odd is that one of these processes has apparently usurped the original rpc.statd process as far as rpcbind is concerned. Running rpcinfo reports statd on the following ports:

# rpcinfo -p
...
100024    1   udp  34322  status
100024    1   tcp  41686  status

These correspond to PID 26145 (which you can see is one of the rpc.statd instances in the above output from ps).

This wouldn't be a problem if everything is working, but yesterday the system began to experience a problem with NFS mounts...any attempt to mount a new filesystem would result in:

mount.nfs: mount system call failed

Killing off all the rpc.statd services "resolved" the problem, but we're puzzled as to what's going on here. We've never seen this behavior on our similarly configured CentOS 5 systems.

larsks
  • 41,276
  • 13
  • 117
  • 170
  • Since rpc.statd is started by mount.nfs, this could be a result of many mount attempts after nfs hickups. Anything in the logs with a matching STIME? – Dmitri Chubarov May 01 '12 at 17:10

1 Answers1

2

Well, this appears to be partly our fault and partly a bug in RedHat's authconfig command. Our Puppet configuration was causing authconfig --updateall to be run every hour. This was unnecessary but generally it shouldn't be a problem...except that authconfig restarts the rpcbind service.

Restart rpcbind causes it to forget about all the services that have registered with it. While authconfig will then restart NIS-related services, this results in a situation where rpc.statd is still running but no longer registered with rpcbind -- which makes it effectively invisible from the point of view of applications that attempt to find it via rpcbind.

I've fixed our Puppet configuration so that it is no longer calling authconfig like this, and I've opened bug 818246 with RedHat.

larsks
  • 41,276
  • 13
  • 117
  • 170