4

2 machines are mounting the same NFS folder(which is on a single 3rd machine).

Files get uploaded from either client server to this mount, usually in chunks. Server A might handle a chunk, then server B the next, all with start and end points defined so it all ads up in the end.

Still, there's been a few instances where if you run md5sum from serverA, you get a different result than with serverB.

Though in reality, the file is on the NFS server and there should be only 1 version pushed to all clients, far as I'm aware.

And it's not fixing itself over time.

I'm currently assuming this is a race condition, related to the chunks not being added in order and NFS caching, you might get one of the servers thinking the file is at a certain length while it isn't, causing a lot of 0000 0000 padding to be added.

So, how come this is happening? Is there a mount setting I need to use to prevent this? Is there a way to tell the NFS server to re-sync the file to all clients?

And just in general, how should this be dealt with?

EDIT: Mounting options on the clients:

machine1:~$ nfsstat -m
/mnt/dirA from <SERVER_IP>:/dirA
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

/mnt/dirB from <SERVER_IP>:/dirB
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

machine1:~$ cat /proc/mounts | grep <SERVER_IP>
<SERVER_IP>:/dirA /mnt/dirA nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0
<SERVER_IP>:/dirB /mnt/dirB nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0


machine2:~$ nfsstat -m
/mnt/dirA from <SERVER_IP>:/dirA
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

/mnt/dirB from <SERVER_IP>:/dirB
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

<SERVER_IP>:/dirA /mnt/dirA nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0
<SERVER_IP>:/dirB /mnt/dirB nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0

EDIT2: The machines are both Ubuntu 18.04, fresh install, the md5sum tool is version 8.28 on both machines.

EDIT3:

I found this note that I kept on the files. I performed an xxd to get the hexdump, from both machines, from the mount to the local filesystem of the machines. As to be certain it was captured from the point of view of the individual machines. As you can see, according to machine01, there's empty padding in the file, but not according to machine02.

This is the result:

root@machine01:/home/kdguser# grep -C 5 '2ddd5000' output01
2ddd4fb0: 0a78 95ff c53e e2c4 f79a db05 0a59 d7d1  .x...>.......Y..
2ddd4fc0: 85a8 1192 26a6 a25a d741 db3c a61f e72e  ....&..Z.A.<....
2ddd4fd0: 4d0b 97b6 93cc 7845 6ef4 0cca f9aa 9390  M.....xEn.......
2ddd4fe0: 9f00 bacd 707f 2398 f419 e49e 8073 67fb  ....p.#......sg.
2ddd4ff0: 89f5 9450 99f5 808f 4b21 3154 f97f 1271  ...P....K!1T...q
2ddd5000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
2ddd5010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
2ddd5020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
2ddd5030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
2ddd5040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
2ddd5050: ba34 fb76 5af3 69d2 9af0 4711 8a0c eae8  .4.vZ.i...G.....

root@machine02:/home/kdguser# grep -C 5 '2ddd5000' output02
2ddd4fb0: 0a78 95ff c53e e2c4 f79a db05 0a59 d7d1  .x...>.......Y..
2ddd4fc0: 85a8 1192 26a6 a25a d741 db3c a61f e72e  ....&..Z.A.<....
2ddd4fd0: 4d0b 97b6 93cc 7845 6ef4 0cca f9aa 9390  M.....xEn.......
2ddd4fe0: 9f00 bacd 707f 2398 f419 e49e 8073 67fb  ....p.#......sg.
2ddd4ff0: 89f5 9450 99f5 808f 4b21 3154 f97f 1271  ...P....K!1T...q
2ddd5000: c969 a259 431e 2a17 12b4 8365 07cb 5e56  .i.YC.*....e..^V
2ddd5010: fa61 327f eb63 1b13 bc30 eb4b c8f0 af14  .a2..c...0.K....
2ddd5020: 6ebe 3f79 9012 7ece 1662 e104 be19 b249  n.?y..~..b.....I
2ddd5030: 9b9c f61d 180b e92a b93b 9980 aba4 ba41  .......*.;.....A
2ddd5040: 0929 fece fc8a 5309 3883 2562 fe2a 459a  .)....S.8.%b.*E.
2ddd5050: ba34 fb76 5af3 69d2 9af0 4711 8a0c eae8  .4.vZ.i...G.....

While the actual file is the one as seen from machine02. Yet machine01 is showing something else.

EDIT4: Just to be clear, the length of the files is identical, the md5 is different on each client.

KdgDev
  • 205
  • 1
  • 6
  • 20

2 Answers2

4

I recommend reading the "Data And Metadata Coherence" section of the nfs man page.

The NFS version 3 protocol introduced "weak cache consistency" (also known as WCC) which provides a way of efficiently checking a file's attributes before and after a single request. This allows a client to help identify changes that could have been made by other clients.

In particular, you need to use noac:

When noac is in effect, a client's file attribute cache is disabled, so each operation that needs to check a file's attributes is forced to go back to the server. This permits a client to see changes to a file very quickly, at the cost of many extra network operations.

Alas,

The noac mount option prevents the client from caching file metadata, but there are still races that may result in data cache incoherence between client and server.

So you may need to open the file with the O_DIRECT flag if noac doesn't solve the problem for you.

The NFS protocol is not designed to support true cluster file system cache coherence without some type of application serialization. If absolute cache coherence among clients is required, applications should use file locking. Alternatively, applications can also open their files with the O_DIRECT flag to disable data caching entirely.

Mark Wagner
  • 17,764
  • 2
  • 30
  • 47
  • You can confirm if this is the problem if you use `ls -l filename` on both clients, and see if they show different lengths. – Barmar Sep 11 '19 at 13:50
  • @Barmar No, the problem is the length is identical, but the MD5 is different. I explained that in the original post. – KdgDev Sep 13 '19 at 08:38
  • If the length is identical, then it's not an attribute cache problem, since that uses the same cache. – Barmar Sep 13 '19 at 08:41
0

Disclaimers: First, I do not use Ubuntu. Second, I am "old school." Third, he documentation will probably disagree with me (see Second disclaimer).

BLUF: This is probably a timing, caching, or buffering issue.

Explanation: In the Old Days, the program would not actually write to disk immediately. The OS would actually send the file data to a buffer. When the buffer was (nearly) full, the buffer would flush to disk. I.e. the contents of the buffer would then be physically written to the disk itself.

For disk arrays, sometimes the disk controller would also have a cache. Data could potentially arrive at the controller faster than the disk could write, so it would be cached in the controller until it the disk could catch up.

For network traffic, data is generally transmitted in packets. TCP/IP, there is no guarantee that the packets will arrive in the order they were sent. So, there is a buffer which holds the packets and re-assembles them in the correct order.

Today, the buffers are supposed to cache immediately. Back in the day, we would run the sync command to force buffer flushes.

The issues that I see here are:
Each server has a "Next Block Number" where it is supposed to start writing when it's turn comes. This value could be out of sync between server A and Server B.

The cache, or buffer, may not be writing quickly enough. E.g. Server A has to send it's data to Server C. Server C has to physically write it to disk. Server B has to re-read the file from the disk before it can "see" it.
This means that Server B may have a hole in it's data from the previous flush of Server A. And vice versa.

Server C, the NFS Server, could be overloaded with read/write requests. Does Server C (the NFS server) have yet a different check sum?

Server A and Server B may not be re-reading fast enough.

Hopefully, this will give you some insight into where to look for answers.

Possible Troubleshooting steps: Is it possible to quite the network, run a few sync command on each server and see if the match?
Does the file eventually catch up? You mentioned a hole in the data.

As you can see, according to machine01, there's empty padding in the file, but not according to machine02.

After a time (TBD), does the padding fill in with the missing data? If so, you have a buffering or timing issue. If not, you have a much bigger problem with the entire system design.
Can you revisit the 2 server problem? Can you have only 1 of the servers take over all the writing and fail over to the other sever if necessary?
Are there caching parameters or timing values in your configurations that you can tweak?

Scottie H
  • 227
  • 2
  • 9
  • The md5 on the actual NFS server would be one of the 2 on the clients, but never a different one altogether. The system has been altered to use 1 server consistently per file already(as you point out) and it hasn't occurred since then. Currently I've no example file I could perform the sync on. No sync was ever done and no changes in the files were ever witnessed. They're gone now. – KdgDev Sep 10 '19 at 13:25