So I have a backup server we've been using for some time, it's a FreeBSD server running zfs and serving it over NFS. The export is simple: /backup/vm -maproot=root -alldirs
. If relevant, that was configured through zfs:
zfs get sharenfs
backup/vm sharenfs -maproot=root -alldirs local
It's been running fine and we've even restored these backups. Today I discovered purely by accident, that files read from the nfs share, don't match what was written (and what's on the server).
To demonstrate: on the server we have
pg11.txt (downloaded on the server)
pg11.txt.1 (uploaded by a client over nfs)
Both of which are Alice in Wonderland, downloaded from here: http://www.gutenberg.org/cache/epub/11/pg11.txt
On the nfs server:
md5 pg11.txt*
MD5 (pg11.txt) = eff1e5d84df1d3a543d1c578192a2367
MD5 (pg11.txt.1) = eff1e5d84df1d3a543d1c578192a2367
So far so good. Now on a client:
md5sum pg11.txt*
4d79d99b8eebe364cddf5ce42949bc3e pg11.txt
eff1e5d84df1d3a543d1c578192a2367 pg11.txt.1
What? Reading pg11.txt
from the client I can easily find lines like:
Alice started to her feet, for it flashed across her <80>^A^@<80>^V<A0>R+^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^A^@^@^A<A4>^@^@^@^A^^@^@^@^@^@^@^@^@^@^@^@^@^B<8E>^^@^@^@^@^@^B^B^@^@^@^@f7<D9>^@^@^@^@^@^@^V^V<EE>3^@^@^@^@^@^@^BFT^B<8C<FF>^E<D9>m(T^B<8C><E7>^]<CE>[<95>T^B<8C><E7>^]<CE>[<95>^@^A^@^@^@^@^@^@^@^A^@^@<U+FEFF>Project Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll
Now on a different client:
md5sum pg11.txt*
eff1e5d84df1d3a543d1c578192a2367 pg11.txt
b9c4076a85a151e730b9a9077fd6023b pg11.txt.1
2nd client but over tcp:
md5sum pg11.txt*
d80ce8c17092b1b759295e27a3c0af60 pg11.txt
14cde84fd05bd39845c9bb8fc0042eda pg11.txt.1
The previous clients where both XenServer 6.2, if I try an Ubuntu system:
md5sum pg11.txt*
eff1e5d84df1d3a543d1c578192a2367 pg11.txt
81ca4f5b9b334d00a07fcb16f444a60a pg11.txt.1
So every client seems to have a different picture, and usually not the right one. I'm hoping someone can give me some clue as to what's happening here and how to fix it, because I'm well stumped.
Edit:
The various files, including diff can be found here: https://gist.github.com/Whoops/0fbe1751675d5e344d43. It appears that the start of the file is repeated several (7) times, preceded by the same binary string each time. Also it's interesting to note that the corruption appears to be consistent for each client, i.e. each client always sees the same corrupted version, rather than different corruption on each read.
Edit2:
The problem occurs with both NFSv3 and 4. It appears to only occur on Linux clients, not other FreeBSDs. Tested clients are XenServer 6.2 and Ubuntu 10.04 which means if it's a client bug, it spans kernel versions 2.6 - 3.11. I don't currently have another FreeBSD server to test with.