Monitoring file access in Linux

Question

Is there a way to gather statistics about blocks being accessed on a disk?

I have a scenario where a task is both memory and I/O intensive and I need to find a good balance as to how much of the available RAM I can assign to the process and how much I should leave for the system for building its I/O cache for the block device being used.

I suspect that most of the I/O that is currently happening is accessing a rather small subset of files (or parts of large files) and that performance could be optimized by increasing the RAM that is available for I/O buffering.

Ideally, I would be able to create something like a "heat-map" that shows me which parts of the files are accessed most of the time.

Setup currently is based on CentOS 5 on AWS/EC2 m1.large instance. Disk setups are either ephemeral block devices in a RAID0 setup (LVM) or alternatively a single (500GB) EBS

Update: Originally, this question was talking about disk blocks, which was misleading as I am actually interested in the logical blocks being accessed and I don't care where they are on the physical devices. I changed this to make clear that it is parts of files I'm interested in. I apologize for the confusion.

Please provide more context and information about your hardware. Server make/model, storage information (RAID controller, RAID arrangement) also matters. Finally, what operating system distribution and version are you using? — ewwhite, Nov 09 '12 at 16:19
Sorry for being so vague. Updated the question with some information about the setup. — VoidPointer, Nov 12 '12 at 11:34
See my answer below about `vmtouch`. That's a tool that you can use to force files into the disk/virtualmemory cache. — ewwhite, Nov 12 '12 at 12:47

score 3 · Answer 1 · answered Nov 09 '12 at 17:00

I'm not sure you fully understand how modern buffer caches work -- you're about half right in that you want to limit how much RAM your process uses (so there's "enough" available for the buffer cache, but you're thinking about it in the wrong way.

What you're asking for is not really useful for tuning the buffer cache -- it MIGHT be useful if you have a single contiguous disk (or an array that presents as one and behaves as one) and are looking at optimizing on-disk layout, but that's getting into Deep Filesystem Magic.
You can read McKusick's papers on filesystem design (or spend 42 minutes and watch this great video) to get a basic concept of how the filesystem already tries to optimize that for you - Filesystems are pretty good at getting the on-disklayout right.

In terms of buffer cache optimization, you want to look at the number of cache hits vs. cache misses (and specifically what's causing the misses). The physical location on the disk doesn't matter - what matters is how many times you have to hit the disk to get what you want, and whether your cache is big enough that it's not constantly churning (essentially negating the cache efficiency).

Tuning that is a bit more trial-and-error than anything else -- a grossly inefficient rule of thumb is to leave 2x the size of your biggest file/chunk of data for the buffer cache, but you're almost always better off starting hugely skewed to either the app or the cache and adjusting toward peak performance.

Thank you. My question was worded in a very misleading way because I talked about disk blocks when I was actually interested in what parts of the logical file layout are accessed most frequently and where most of the cache-misses are occurring. (I updated the question) — VoidPointer, Nov 12 '12 at 11:48

score 1 · Answer 2 · edited Apr 13 '17 at 12:14

1

If you're talking about a server-class system, there are other variables to consider. I understand what you're asking for, but on modern systems, these things have been abstracted by multiple levels of cache and the optimizations from intelligent RAID controllers.

For write-biased activity, much of your random write workload should be written to battery or flash-backed non-volatile cache (in order to provide quick acknowledgement of writes), coalesced and flushed sequentially to your disks. If you're not employing the use of something like this, you're leaving performance on the table.

For read activity, the OS does a reasonable job of cachine. Having additional controller cache helps. And beyond that, you can employ the use of a few tricks to help control your virtual memory subsystem. (see: Virtual Memory Toucher)

Also see: Clear / Flush cached memory

But again, we need details of your setup to help understand how to help.

edited Apr 13 '17 at 12:14

Community

1

answered Nov 09 '12 at 17:25

ewwhite

194,921
91
434
799

vmtouch is a very interesting pointer into the right direction. I am indeed mostly concerned with read access at this time. What I'm after is the amount of RAM that is needed to provide a better cache hit rate when those large files are accessed... – VoidPointer Nov 12 '12 at 13:05
That will depend on the size of those files. One thing `vmtouch` can do for you right now is tell you how much of the existing files are in cache. The other thing it can do for you is lock the directory (or specific files) into cache for you to ensure that they're in the VM subsystem. Either way, that gets you what you want - serving the critical files from RAM. – ewwhite Nov 12 '12 at 13:11
The files are too large to fit in RAM. However, there is reason to believe that only a subset (certain chunks) of those files is used most of the time. I would like to find out how large that subset is. I.e. what parts of those files are fetched from disk most often because of a cache miss. – VoidPointer Nov 12 '12 at 23:15
Yes. Use the vmtouch tool on the file(s) in question to see HOW MUCH OF THE FILE EXISTS IN CACHE... `vmtouch -v /path/to/file` will show this. – ewwhite Nov 12 '12 at 23:24

John Siu · Answer 3 · 2012-11-09T16:43:31.613

0

Use iotop. That is exactly what you need.

edited Nov 09 '12 at 16:43

answered Nov 09 '12 at 16:13

John Siu

3,577
2
15
23

While iotop is nice it doesn't show which parts of the disks are accessed. – Cristian Ciupitu Nov 09 '12 at 16:16
1

@CristianCiupitu Disk locality is a vague concept these days (I can't remember the last time I saw a server that wasn't using some kind of RAID. Accessing (C/H/S) 1/2/3 on the drive the OS sees could be one of several locations on the physical media. – voretaq7 Nov 09 '12 at 16:51
2

@voretaq7: I don't see the question as getting the exact physical location on disk, but as whether the same files or parts of them are read/written multiple times. As you mention in your answer is more a thing of a cache hit or miss. – Cristian Ciupitu Nov 09 '12 at 18:13
@CristianCiupitu Yeah - either way `iotop` won't cut it: it'll show you your I/O pigs, but (AFAIK) not what/where they're grabbing data from. – voretaq7 Nov 09 '12 at 18:26
@CristianCiupitu: Thanks, you are absolutely correct - I updated the question and hope it is no longer so misleading. – VoidPointer Nov 12 '12 at 11:51

score 0 · Answer 4 · answered Nov 09 '12 at 17:09

I will put my vote in for DSTAT (http://dag.wieers.com/home-made/dstat/). Take a look at some of the switches like top-io, top-latency, top-mem, etc. It is not going to do a heat map for you or which parts of the disk are being accessed but it may help point you in the right direction

Monitoring file access in Linux

4 Answers4