How to accelerate metadata (getattr) requests on an NFS file server for fast locally-cached reads?

Question

We have a storage server with a 12 disk RAID-6 pool of 100 TB. It is used by several client compute nodes via NFS, mostly running deep learning training (lots of images ~100 KB, loaded in random order).

To speed up file access, we use cachefilesd on the compute nodes. This works well, because the workload is such that each training job usually reads the same 10-100 GB over and over, which can be cached well on a local SSD.

Now assume that besides the regular training jobs, another type of workload is also started (a heavy read job). This one cannot be cached so well, because it reads through a large amount of data just once (takes long), instead of the repeated behavior of usual training jobs. This job will always need to read deep from the actual disks.

The problem we are facing is that even though the regular training jobs can read from their local cache, the metadata reads become a bottleneck. That is, cachefilesd needs to make sure that the cache is still up-to-date, so it asks the NFS server if the file was modified since it was cached. Therefore cachefilesd requests the modification attribute (mtime) of each file that is read. However, since the storage server is busy serving the heavy read job, it has not enough time to respond to these getattr requests coming from the compute nodes at a high enough rate, resulting in a bottleneck.

Is there any way to tune the server such that these getattr requests are answered very quickly (so the regular training jobs run fast), while potentially delaying the read of actual file content data for the heavy read job?

score 3 · Answer 1 · answered Sep 28 '19 at 09:22

3

Metadata intensive operations is not the best use case for NFS workload. If the content of the directories is not changing, or it's of that changes are not visible to the clients right away, you might increase the client side attribute cache lifetime by adjusting actimeo mount option. Check man nfs for more details.

answered Sep 28 '19 at 09:22

kofemann

4,308
1
21
27

I think this wouldn't help our workload because the second read of the same file only comes 10-60 minutes after the first read, so the attribute cache will expire by then for sure. – isarandi Sep 28 '19 at 09:59

How to accelerate metadata (getattr) requests on an NFS file server for fast locally-cached reads?

1 Answers1