I/O fails to nfs mount (occasionally) - server timed out

2

2

I have a Linux-based fileserver (ark) which exports a raid volume over nfs4.

Sometimes when doing large copy operations, it will time out.

[nathan@ebisu /mnt/extra/disk] rsync -a --progress . /mnt/raid/backup/backup.extra/disk
sending incremental file list
BSD.0/
BSD.0/BSD.0.vdi
   411336704  12%   48.60MB/s    0:00:56
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: write failed on "/mnt/raid/backup/backup.extra/disk/BSD.0/BSD.0.vdi": Input/output error (5)
rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.9]
rsync: connection unexpectedly closed (32 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(605) [sender=3.0.9]

I know that this is a timeout because dmesg tells me so:

 [nathan@ebisu ~] dmesg | tail
 [52722.138132] nfs: server ark not responding, timed out
 [52722.138137] nfs: server ark not responding, timed out
 [52722.138145] nfs: server ark not responding, timed out
 [52722.138150] nfs: server ark not responding, timed out
 [52722.138154] nfs: server ark not responding, timed out

In case you think this may be a bug just related to rsync, I tried doing a regular copy too:

[nathan@ebisu /mnt/extra/disk] cp BSD.0/BSD.0.vdi /mnt/raid/backup/backup.extra/disk
cp: error writing ‘/mnt/raid/backup/backup.extra/disk/BSD.0.vdi’: Input/output error
cp: failed to extend ‘/mnt/raid/backup/backup.extra/disk/BSD.0.vdi’: Input/output error

I don't even know where to begin to look to fix this problem. They are both connected via gigabit ethernet through a gigabit switch. I have used ethtool to validate that both are actually running at gigabit speeds. Most operations between the host and the server work fine; it's only in the middle of big transfers that it dies.

Nothing in the fileserver's dmesg stands out as being awkward.

[root@ark ~]# dmesg | tail
[    7.088959] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[    7.266363] NFSD: starting 90-second grace period (net ffffffff81880e80)
[ 8492.222871] type=1326 audit(1365926452.334:2): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=336 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7fe1be17edc7 code=0x0
[ 8492.314714] type=1326 audit(1365926452.424:3): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=338 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7fe30fd9ddc7 code=0x0
[ 8492.405336] type=1326 audit(1365926452.514:4): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=340 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7f6bb032ddc7 code=0x0
[ 8492.501048] type=1326 audit(1365926452.611:5): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=342 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7f81d7c2fdc7 code=0x0
[ 8492.603056] type=1326 audit(1365926452.714:6): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=344 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7f97c8bc9dc7 code=0x0
[ 8492.703732] type=1326 audit(1365926452.814:7): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=346 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7f0661b2fdc7 code=0x0
[ 8492.837977] type=1326 audit(1365926452.947:8): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=348 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7fd024f8cdc7 code=0x0
[54125.173195] type=1326 audit(1365972085.286:9): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=353 comm="sshd" sig=31 syscall=48 compat=0 ip=0x7f390a6b9dc7 code=0x0

The syslog is similarly devoid of any issues.

Some more random diagnostic information I gathered:

[root@ebisu etc]# nfsstat -rc
Client rpc stats:
calls      retrans    authrefrsh
1057273    34163      1050608 

That's a whole lot of retrans.

I checked to see if I was saturating my nfsd threads, but no, they were mostly idle.

Just for fun, I did a similar transfer completely locally, to see if I was hitting disk errors or slowness:

[root@ark ~]# rsync --progress test.img /mnt/bigraid/backup/backup.ark/
test.img
  8589934592 100%   48.38MB/s    0:02:49 (xfer#1, to-check=0/1)

sent 8590983238 bytes  received 31 bytes  50386998.65 bytes/sec
total size is 8589934592  speedup is 1.00

Looks like it hit a bit under 50MB/s, which is roughly the speed I was getting on the remote rsync.

I tried doing the transfer while running htop on the server, and I did notice that after a while it seems like nfsd may have requested more memory buffers. It may be memory related, as by modern standards the server is not a high-memory system. But it seems to me that this should just cause the transfer to go slower, not to time out entirely.

Nathan

Posted 2013-04-11T18:58:36.850

Reputation: 379

Anything in Ark's log file? Maybe it's a disk timeout. – Johnny – 2013-04-11T19:34:09.767

I am assuming these two are not connected using a cross-over cable but through some network switches and routers. I'd ask the network admins to put a sniffer to both sides of this connection and determine when and where this time out problem starts and what leads to that. Sounds too much like a networking gear not liking the the large amounts of data passing thru and dropping the connection to me. – MelBurslan – 2013-04-11T19:51:27.017

@Johnny Added to question as edit – Nathan – 2013-04-11T19:56:02.730

@Mel_Burslan I briefly described the network topology in the question. There is no "network admin" (or, more precisely, that would be me. This is my home network). I could do a tcpdump and grunge through it, though I doubt that would be helpful. I'll try to scrounge another gigabit switch from somewhere in case it's the switch that's flaking out. – Nathan – 2013-04-11T19:58:41.183

It seems to me like the problem may be actually with the network. When the logs are clean, just about a timeout error, it may be network robustness. Can you ping the nfs server from another terminal while copying and see if at any time the network seems to drop? – Bichoy – 2013-04-18T01:18:10.370

@Bichoy Good suggestion. I tried that and had 0% packet loss with an average time of .42 ms. I also tried Mel_Burslan's suggestion of replacing the switch, which didn't help. I don't think it's a network timeout. – Nathan – 2013-04-18T13:12:06.153

Can you try checking/adjusting the various limits listed in this article? http://nfs.sourceforge.net/nfs-howto/ar01s05.html

– Bichoy – 2013-04-18T17:44:58.400

@Bichoy Block size: 16k. MTU: 1500. Exportfs uses the 'sync' option. Backing store is a raid5. I think that's all the tunables that article mentions. – Nathan – 2013-04-18T18:41:05.910

Answers

2

This is not really an answer but a few troubleshooting tips:

  1. Make sure the problem is connected to NFS, export the same volume using another protocol, SMB for example (see here for instructions). Do you get the same errors? Alternatively, try copying with scp:

    [nathan@ebisu ~] scp root@ark:/mnt/bigraid/backup/backup.ark/test.img .
    
  2. Does this only happen when copying a single large file or do you also get the same errors if you copy the same amount of data in lots of little files?

    split test.img
    rsync -a --progress x* /mnt/raid/backup/backup.extra/disk
    
  3. According to this page, high retrans values indicate

    that the number of available NFS kernel threads on the server is insufficient to handle the requests from this client

    So, try increasing the number of threads by setting the RPCNFSDCOUNT variable. Depending on your distribution this could be set in /etc/sysconfig/nfs or in /etc/default/nfs-kernel-server (that's where it is on my Debian). Try something like

    RPCSVCGSSDOPTS=16
    
  4. The same page also suggests you set the block size to 32 on the client. Assuming you are mounting the share from your /etc/fstab, add these options to the relevant line:

    rsize=32768,wsize=32768,intr,noatime
    

    As well as increasing the read/write block size, these options will

    also ensure that NFS operations can be interrupted if there is a hang and will also ensure that the atime won’t be constantly updated on files accessed on remote NFS file systems.

terdon

Posted 2013-04-11T18:58:36.850

Reputation: 45 216

>

  • I can scp the file fine. 2. Smaller files don't have the problem. About 760M seems to be the sweet spot; anything more than that runs the risk of stalling. 3. I tried increasing the thread count, and it doesn't seem to have helped. 4. My block size was already set to 32k, actually.
  • < – Nathan – 2013-04-23T01:57:45.530

    Well I'd say that only leaves NFS. I assume rsync works OK if used over ssh? According to this, the async instead of sync option might be better. You could also try increasing the timeo value. Finally, I've seen a few posts claiming there are some kernel version-specific NFS problems with large file transfers. Are your systems up to date?

    – terdon – 2013-04-23T02:23:14.233

    @Nathan, also see here.

    – terdon – 2013-04-23T02:30:55.373

    I'll read those questions and try using async instead of sync to see if that helps. My systems are up to date; both are using the 3.8 kernel. – Nathan – 2013-04-23T12:43:09.760

    async seems to make the problem worse (and I'm not super comfortable with the increased possibility of data loss if the server dies). I also tried messing with the rsize and wsize as mentioned in the other link, but that also doesn't seem to have helped :( – Nathan – 2013-04-23T22:06:00.987

    In that case I'm stumped. At least you know it is something to do with NFS and large files, there are many posts on the subject so it can't be that rare. Happy reading...

    – terdon – 2013-04-24T00:54:38.090

    1

    That looks very much like a network problem to me. Some network cards (especially if they are Realtek chips) are not very standard-conforming, especially at 1Gbps and with a switch in between. So you should try:

    • connecting the two without a switch
    • changing the Ethernet cables
    • force the connection speed to 1000Mbps Full duplex and see if the problem persists
    • force the connection speed to 100Mbps Full duplex and see if the problem persists (most often the instability will not show at 100Mbps, and even though this won't be the setup you want, it will help you narrow down the incompatibility)
    • checking ifconfig and ethtool -S ethX for errors
    • checking the MTU using ifconfig and setting it to 1500 if it's higher
    • using ping -f to check for dropped packets between the two, especially with high values of -s (ping packet size) - unstable connections will exhibit packet loss when you run something like ping -f -s 10000 for a couple of seconds

    Stefan Seidel

    Posted 2013-04-11T18:58:36.850

    Reputation: 8 812

    tcp can use retransmission to deal with packet loss, but the same mechanism can cause more traffic to flow through an already problematic connection. If the connection problem is over-saturation, this compounds delay and loss. tcp is 'reliable' only in fairly normal network conditions. – belacqua – 2014-08-06T20:47:00.520

    I can see why you would be thinking network, but this only happens on nfs. I can scp the file fine, and nfs reads are fine--it's only writes that seem to be the problem. – Nathan – 2013-04-23T01:59:08.223

    But the difference is probably that NFS uses UDP, and all the others use TCP, and TCP conceals these network errors by retransmission. – Stefan Seidel – 2013-04-23T06:37:59.163

    Actually, I have it mounted with proto=tcp – Nathan – 2013-04-23T12:41:29.017

    0

    I had the same error message (but it isn't the same problem, since I could reproduce teh error each time!).

    Running rsync more verbosely (rsync -vv) made it obvious that the target filesystem was full!

    rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) test/file1 is uptodate test/file2 is uptodate test/file3 is uptodate rsync: recv_generator: mkdir "test/file4" failed: No space left on device (28) * Skipping any contents from this failed directory * rsync: recv_generator: mkdir "test/file5" failed: No space left on device (28) rsync: close failed on "test/file6": Input/output error (5) rsync: connection unexpectedly closed (78708 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]

    Franklin Piat

    Posted 2013-04-11T18:58:36.850

    Reputation: 111

    Interesting. That's not the problem I'm experiencing, but it's a useful data point! – Nathan – 2015-07-21T20:36:19.230