36

Here the output of free -m:

             total       used       free     shared    buffers     cached
Mem:          7188       6894        294          0        249       5945
-/+ buffers/cache:        698       6489
Swap:            0          0          0

I can see almost 6GB(5945MB) memory out of 7GB is used in caching the files. I know how to flush the caches. My question is: Is possible see which files(or inodes) are being cached?

ssapkota
  • 518
  • 1
  • 4
  • 9
  • I don't know the answer but 2 things are of interest: How do you flush the caches? Why is that of interest, I'm not implying anything here - just interested in the use case – Martin M. Jun 08 '11 at 20:05
  • 2
    This flushes both the `buffers` and `cached`: `sysctl -w vm.drop_caches=3`. You might want to read more on it, before using. Sometimes its just needed. Its available - this should be another reason :) – ssapkota Jun 08 '11 at 20:07
  • A lot of people are [asking for it](http://www.google.com/search?&q=clear+cache+%2Blinux). There should be some reason. – ssapkota Jun 08 '11 at 20:18
  • dropping caches comes handy if you want to do some I/O related performance measurements and do not want to have them "spoiled" by O/S caching – the-wabbit Jun 09 '11 at 08:22

5 Answers5

34

Well, there is an easy way to take a look at the kernel's page cache if you happen to have ftools - "fincore" gives you some summary information on what files' pages are the content of the cache.

You will need to supply a list of file names to check for their presence in the page cache. This is because the information stored in the kernel's page cache tables only will contain data block references and not filenames. fincore would resolve a given file's data blocks through inode data and search for respective entries in the page cache tables.

There is no efficient search mechanism for doing the reverse - getting a file name belonging to a data block would require reading all inodes and indirect blocks on the file system. If you need to know about every single file's blocks stored in the page cache, you would need to supply a list of all files on your file system(s) to fincore. But that again is likely to spoil the measurement as a large amount of data would be read traversing the directories and getting all inodes and indirect blocks - putting them into the page cache and evicting the very page cache data you were trying to examine.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
  • 1
    fincore does inform if a file is present in cache or not. However, is there any tool which will list all the files that are cached (fincore takes the file name as input and searches. I want to look into all the entries that are currently cached) – Joe Nov 12 '14 at 09:06
  • 1
    @Joe I suppose that the information stored in the kernel's page cache tables only will contain data block references and not filenames. `fincore` would resolve a given file's data blocks through inode data and search for respective entries in the page cache tables. There is no efficient search mechanism doing the reverse - getting a file name belonging to a data block would require reading all inodes and indirect blocks on the file system. Thus, algorithmically you will be better off supplying a list of all files on your file system to `fincore` if you really need this level of information. – the-wabbit Nov 12 '14 at 11:45
  • @the-wabbit Thanks. Other than files, are there other things that are part of cache, such as descriptors, shared memory etc. – Joe Nov 12 '14 at 15:43
  • @Joe Unfortunately, I am not that deep into Kernel internals to give an authoritative answer on this topic. The page cache seems generic enough to cache other types of data than just file system data blocks, but I am not aware of any examples. – the-wabbit Nov 13 '14 at 10:03
  • -1 The mentioned projects appears to be unmaintained for years, `vmtouch` as mentioned in https://serverfault.com/a/643784/26218 provides this functionality and appears to be active. I would consider deleting this answer or at least editing it (if deleting is not possible). – Flow Jun 04 '18 at 14:34
  • 2
    The StackExchange stance on [what to do with obsolete answers](https://meta.stackexchange.com/questions/261817/how-do-we-encourage-edits-to-obsolete-out-of-date-answers) is somewhat ambiguos. Deleting or substantially changing accepted answers is frowned upon. Changing this answer to recommend vmtouch would duplicate @ewwhite's existing answer, which has a similar number of upvotes. So simply further upvoting ewwhite's answer should do the trick, right? – the-wabbit Jun 06 '18 at 11:00
26

You can use the vmtouch utility to see if a named file or directory is in cache. You can also use the tool to force items into cache or lock them into cache.

[root@xt ~]# vmtouch -v /usr/local/var/orca/procallator.cfg
/usr/local/var/orca/procallator.cfg
[     ] 0/5

           Files: 1
     Directories: 0
  Resident Pages: 0/5  0/20K  0%
         Elapsed: 0.000215 seconds

Now I can "touch" it into cache.

[root@xt ~]# vmtouch -vt /usr/local/var/orca/procallator.cfg
/usr/local/var/orca/procallator.cfg
[OOOOO] 5/5

           Files: 1
     Directories: 0
   Touched Pages: 5 (20K)
         Elapsed: 0.005313 seconds

Now to see how much is cached...

[root@xt ~]# vmtouch -v /usr/local/var/orca/procallator.cfg
/usr/local/var/orca/procallator.cfg
[OOOOO] 5/5

           Files: 1
     Directories: 0
  Resident Pages: 5/5  20K/20K  100%
         Elapsed: 0.000241 seconds
ewwhite
  • 194,921
  • 91
  • 434
  • 799
5

You can also use pcstat (Page Cache Stat) https://github.com/tobert/pcstat

Hope it helps someone.

blavoie
  • 51
  • 1
  • 1
5

I wrote following script which prints all files and their cache status using pcstat command. It is self-contained script for x86_64 linux systems. It downloads pcstat if needed .

First argument is filesystem location to analyze and second argument is number of result (Top N by number of pages in cache).

#!/bin/bash
#Exit if a variable is not set
set -o nounset
#Exit on first error
set -o errexit

if [ $# -eq 0 ]; then
echo "Usage: $0 <root-dir> [number-of-results]"
echo
echo "Example $0 /var 10"
echo "will show top 10 files in /var which are loaded in cache"
exit
fi

ROOT=$1
#Number of results to show
HOW_MANY=50
[ -n ${2-} ] && HOW_MANY=$2


SCRIPT_DIR="$( cd -P "$( dirname "$0" )" && pwd )"
if [ ! -x $SCRIPT_DIR/pcstat ]; then
(
cd $SCRIPT_DIR
rm -f pcstat
curl -L -o pcstat https://github.com/tobert/pcstat/raw/2014-05-02-01/pcstat.x86_64
chmod +x pcstat
)
fi

FIND="find ${ROOT} -not ( -path /proc -prune ) -not ( -path /sys -prune ) -type f -size +0c -print0"
$FIND |  xargs -0 ${SCRIPT_DIR}/pcstat -terse -nohdr | sort --field-separator=, -r -n -k 6 | head -n ${HOW_MANY}
Nadddy
  • 51
  • 1
  • 1
4

I write a very simple shell script to show the cached files by using of linux-fincore. Since cache is one part of memory, my code is find the top 10 RSZ usage of process, and the use lsof to find out the files that process opened, finally use linux-fincore to find out whether these files are cached or not.

Please correct me if I am thinkg wrong.

#!/bin/bash
#Author: Shanker
#Time: 2016/06/08

#set -e
#set -u
#you have to install linux-fincore
if [ ! -f /usr/local/bin/linux-fincore ]
then
    echo "You haven't installed linux-fincore yet"
    exit
fi

#find the top 10 processs' cache file
ps -e -o pid,rss|sort -nk2 -r|head -10 |awk '{print $1}'>/tmp/cache.pids
#find all the processs' cache file
#ps -e -o pid>/tmp/cache.pids

if [ -f /tmp/cache.files ]
then
    echo "the cache.files is exist, removing now "
    rm -f /tmp/cache.files
fi

while read line
do
    lsof -p $line 2>/dev/null|awk '{print $9}' >>/tmp/cache.files 
done</tmp/cache.pids


if [ -f /tmp/cache.fincore ]
then
    echo "the cache.fincore is exist, removing now"

    rm -f /tmp/cache.fincore
fi

for i in `cat /tmp/cache.files`
do

    if [ -f $i ]
    then

        echo $i >>/tmp/cache.fincore
    fi
done

linux-fincore -s  `cat /tmp/cache.fincore`

rm -f /tmp/cache.{pids,files,fincore}
  • 1
    The set of files in the cache is typically going to be way larger than the small subset of currently-open ones (unless cache space is small). The currently-open files are most likely present in the cache (unless those were long-idle or cache was recently cleaned). Note: `lsof` also reports files mapped into process-address-space (and not necessarily cached). Also likely that large share of files is going to be only partially/sparsely cached... – Vlad Nov 22 '17 at 05:16