Caching/preloading files on Linux into RAM

Question

I have a rather old server that has 4GB of RAM and it is pretty much serving the same files all day, but it is doing so from the hard drive while 3GBs of RAM are "free".

Anyone who has ever tried running a ram-drive can witness that It's awesome in terms of speed. The memory usage of this system is usually never higher than 1GB/4GB so I want to know if there is a way to use that extra memory for something good.

Is it possible to tell the filesystem to always serve certain files out of RAM?
Are there any other methods I can use to improve file reading capabilities by use of RAM?

More specifically, I am not looking for a 'hack' here. I want file system calls to serve the files from RAM without needing to create a ram-drive and copy the files there manually. Or at least a script that does this for me.

Possible applications here are:

Web servers with static files that get read alot
Application servers with large libraries
Desktop computers with too much RAM

Any ideas?

Edit:

Found this very informative: The Linux Page Cache and pdflush
As Zan pointed out, the memory isn't actually free. What I mean is that it's not being used by applications and I want to control what should be cached in memory.

I too am seeking something along these lines. I don't think that general filesystem disk block caching is the answer. Suppose that I want disk block X to always be cached. Something accesses it, and the kernel caches it. So far so good, but the next process wants block Y, so the kernel discards my block X and caches Y instead. The next process that wants X will have to wait for it to come off the disk; that's what I want to avoid. What I would like (and what I think the original poster is after too) is to overlay a write-through cache onto a filesystem that will guarantee the files are always — , Feb 07 '10 at 01:03
Given that the consensus seems to be that Linux should already be caching frequently-used files for you, I'm wondering if you actually managed to make any improvements using the advice found here. It seems to me that trying to manually control caching might be useful to warm up the cache, but that with the usage pattern you describe ("serving the same files all day"), it wouldn't help an already-warmed-up server much, if at all. — Nate C-K, Nov 20 '14 at 15:27
You say you're not looking for a hack, but Linux already does what you want to do by default. The following equation: "serving the same files all day" + "tell the filesystem to always serve certain files out of RAM" equals "Hack" by definition. Did you actually notice any performance improvements? By my experience, Linux cache's the bejeezus out of your filesystem reads. — Mike S, Oct 31 '16 at 15:04
For clarification, linux does cache files, but the metadata is validated for each file for each request. On spinning rust, on a busy web server with a lot of small files, that can still cause IO contention and prematurely wear out drives. Static content and scripts can be rsync into /dev/shm or a custom tmpfs mount on app startup. I've done this for a couple decades and my drives don't wear out prematurely. Also my sites withstand heavy burst load much better this way. This helps on anything from the most expensive enterprise hardware to commmodity hardware. — Aaron, Mar 16 '17 at 12:38

score 73 · Accepted Answer · edited Apr 13 '17 at 12:14

73

vmtouch seems like a good tool for the job.

Highlights:

query how much of a directory is cached
query how much of a file is cached (also which pages, graphical representation)
load file into cache
remove file from cache
lock files in cache
run as daemon

vmtouch manual

EDIT: Usage as asked in the question is listed in example 5 on vmtouch Hompage

Example 5

Daemonise and lock all files in a directory into physical memory:

vmtouch -dl /var/www/htdocs/critical/

EDIT2: As noted in the comments, there is now a git repository available.

edited Apr 13 '17 at 12:14

Community

1

answered Sep 01 '12 at 21:40

seeker

906
8
4

6

For future viewers, try to use the vmtouch [git repository](https://github.com/hoytech/vmtouch) instead of following the instructions on the linked page. That way you get a makefile and can pull updates. – randomous Aug 04 '15 at 05:19
Seems that there's a limit to the size of the file (4GB). Is there any other alternative? – Alix Axel Oct 16 '15 at 16:12
Ok, here's my actual use case: a RPi1 with an old SD card, out there somewhere doing Stuff. Before I get to make a trip there and replace the card (and possibly power supply), I want the OS to touch the card sparingly, preferably never. FS cache is good but beyond my control; /bin and /sbin are already on tmpfs, getting /home/user likewise has other drawbacks. `vmtouch` fits this niche well. – Piskvor left the building Jan 10 '19 at 16:28
1

how does vmtouch work differently than tmpfs? – Edward Torvalds Jan 25 '19 at 10:26

score 31 · Answer 2 · answered Oct 23 '12 at 16:50

This is also possible using the vmtouch Virtual Memory Toucher utility.

The tool allows you to control the filesystem cache on a Linux system. You can force or lock a specific file or directory in the VM cache subsystem, or use it to check to see what portions of a file/directory are contained within VM.

How much of the /bin/ directory is currently in cache?

$ vmtouch /bin/
           Files: 92
     Directories: 1
  Resident Pages: 348/1307  1M/5M  26.6%
         Elapsed: 0.003426 seconds

Or...

Let's bring the rest of big-dataset.txt into memory...

$ vmtouch -vt big-dataset.txt
big-dataset.txt
[OOo                                                 oOOOOOOO] 6887/42116
[OOOOOOOOo                                           oOOOOOOO] 10631/42116
[OOOOOOOOOOOOOOo                                     oOOOOOOO] 15351/42116
[OOOOOOOOOOOOOOOOOOOOOo                              oOOOOOOO] 19719/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOo                        oOOOOOOO] 24183/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo                  oOOOOOOO] 28615/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo              oOOOOOOO] 31415/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo      oOOOOOOO] 36775/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo  oOOOOOOO] 39431/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 42116/42116

           Files: 1
     Directories: 0
   Touched Pages: 42116 (164M)
         Elapsed: 12.107 seconds

this is a great utility and does exactly what OP requested. If only he would accept this as an answer. — laebshade, Jan 30 '13 at 18:23
@CMCDragonkai I don't think it's necessary with ZFS... Think: [ARC and L2ARC](https://pthree.org/2012/12/07/zfs-administration-part-iv-the-adjustable-replacement-cache/). — ewwhite, Aug 16 '15 at 05:38

score 26 · Answer 3 · answered Jul 21 '09 at 07:19

26

A poor man's trick for getting stuff into the filesystem cache is to simply cat it and redirect that to /dev/null.

answered Jul 21 '09 at 07:19

cagenut

4,808
2
23
27

1

Agree. And if you want to ensure certain files are cached, make a cron job which `cat`s the file to /dev/null periodically – Josh Jul 21 '09 at 12:31
Cronjob is a certain inconvenience, but could you comment if there is any disadvantage of using `cat file` (vs. e.g. `vmtouch -vt file`) to load a file into the disc buffer? – user1079505 Sep 09 '20 at 01:20
1

@user1079505 The disadvantage is, that if parts of the file are missing in the RAM, they will be read from the disk again. If you use `vmtouch -t` they are locked in the RAM and there is no need to read them again. Maybe you ask yourself why they got lost. That happens because other files are read from the disc and cached as well. If the cached files are read often, they will stay in the RAM but if you read a huge file that is bigger as your RAM, it will overwrite the complete cache (not if you locked files by `vmtouch`). More infos: https://unix.stackexchange.com/a/539180/101920 – mgutt Oct 10 '20 at 08:14
@cagenut An "answer to this answer": https://unix.stackexchange.com/a/156220/101920 – mgutt Oct 10 '20 at 08:24

score 20 · Answer 4 · answered Jul 21 '09 at 07:17

20

Linux will cache as much disk IO in memory as it can. This is what the cache and buffer memory stats are. It'll probably do a better job than you will at storing the right things.

However, if you insist in storing your data in memory, you can create a ram drive using either tmpfs or ramfs. The difference is that ramfs will allocate all the memory you ask for, were as tmpfs will only use the memory that your block device is using. My memory is a little rusty, but you should be able to do:

 # mount -t ramfs ram /mnt/ram

or

 # mount -t tmpfs tmp /mnt/tmp

and then copy your data to the directory. Obviously, when you turn the machine off or unmount that partition, your data will be lost.

answered Jul 21 '09 at 07:17

David Pashley

23,151
2
41
71

1

Thanks for your answer, but this is obviously what I want to avoid. Otherwise I'd just script it so the computer would create the ramdrive, copy the files and symbolically link to the ramdrive. But then my data is inconsistent. I was hoping for a filesystem where I can 'tag' certain files to be cached in memory. But maybe I'm a bit too optimistic. – Andrioid Jul 21 '09 at 07:21
3

You "tag" files to be cached by accessing them. – womble Jul 21 '09 at 07:35
12

If only there was some way to automatically tag the most commonly used files. – David Pashley Jul 21 '09 at 07:53
depending your memory available, it's done automatically – asdmin Jul 21 '09 at 08:16
@David, it is done automatically. To test it, run: time cat some_large_file > /dev/null. Then run that again. You'll notice that on the second run the time is far less because the file has been cached. – Josh Jul 21 '09 at 12:29
5

Blimey, sarcasm doesn't travel well does it :) – David Pashley Jul 21 '09 at 13:37
Further on Josh's description, you should `grep pattern1 some_large_file` and then follow that with `grep anotherPattern same_large_file` the response time on the second one will be much better. – nik Jul 21 '09 at 18:47
4

Yes, thank you. I understand the concept of IO caching. I even explained it in my answer. Seems you didn't read the subtle comment that it was sarcasm. – David Pashley Jul 21 '09 at 18:55

score 18 · Answer 5 · answered Jul 21 '09 at 08:09

18

After some extensive reading on the 2.6 kernel swapping and page-caching features I found 'fcoretools'. Which consists of two tools;

fincore: Will reveal how many pages the application has stored in core memory
fadvise: Allows you to manipulate the core memory (page-cache).

(In case someone else finds this interesting I'm posting this here)

answered Jul 21 '09 at 08:09

Andrioid

2,600
2
19
21

1

I figured there was a program to do that somewhere. +1 – Brad Gilbert Jul 22 '09 at 03:29

score 9 · Answer 6 · answered Oct 23 '12 at 16:17

There are two kernel settings that can help considerably even without using other tools:

swappiness

tells linux kernel how aggressively it should use swap. Quoting the Wikipedia article:

Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100 inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space. The default value is 60, and for most desktop systems, setting it to 100 may affect the overall performance, whereas setting it lower (even 0) may improve interactivity (decreasing response latency.)

vfs_cache_pressure

Quoting from vm.txt:

Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. ...

By setting swappiness high (like 100), the kernel moves everything it doesn't need to swap, freeing RAM for caching files. And by setting vfs_cache_pressure lower (let's say to 50, not to 0!), it will favor caching files instead of keeping application data in RAM.

(I work on a large Java project and every time I run it, it took a lot of RAM and flushed the disk cache, so the next time I compiled the project everything was read from disk again. By adjusting these two settings, I manage to keep the sources and compiled output cached in RAM, which speeds the process considerably.)

score 4 · Answer 7 · answered Jul 21 '09 at 17:24

4

You may be able to have a program that just mmaps your files then stays running.

answered Jul 21 '09 at 17:24

Brad Gilbert

2,473
2
21
19

5

That is pretty much what 'fadvise' (fcoretools) does, as far as I can tell. – Andrioid Jul 21 '09 at 18:09

score 3 · Answer 8 · answered Jul 21 '09 at 07:44

3

If you have plenty of memory you can simply read in the files you want to cache with cat or similar. Linux will then do a good job of keeping it around.

answered Jul 21 '09 at 07:44

Thorbjørn Ravn Andersen

1,342
8
14

score 2 · Answer 9 · answered Jul 21 '09 at 07:11

2

I very much doubt that it is actually serving files from the disk with 3 GB RAM free. Linux file caching is very good.

If you are seeing disk IO, I would look into your logging configurations. Many logs get set as unbuffered, in order to guarantee that the latest log information is available in the event of a crash. In systems that have to be fast regardless, use buffered log IO or use a remote log server.

answered Jul 21 '09 at 07:11

Zan Lynx

886
5
13

Right you are, I just want to control what is being cached. – Andrioid Jul 21 '09 at 07:12

score 0 · Answer 10 · answered Feb 07 '10 at 01:53

Desktop computers (eg. ubuntu) already uses preloading files (at least, popular shared libraries) to memory on boot. It is used to speed up booting and startup time of different bloarware like FF, OO, KDE and GNOME (with evolution bloat-mailer).

The tool is named readahead http://packages.ubuntu.com/dapper/admin/readahead

There is also corresponding syscall: readahead(2) http://linux.die.net/man/2/readahead

There is also project of preloading daemon: http://linux.die.net/man/8/preload

score 0 · Answer 11 · answered Feb 07 '10 at 05:27

0

http://www.coker.com.au/memlockd/ does this

though you really don't need it, linux will do a pretty good job of caching the files you are using on its own.

answered Feb 07 '10 at 05:27

Justin

3,776
15
20

score 0 · Answer 12 · answered Jul 21 '09 at 07:13

There are various ramfs systems you can use (eg, ramfs, tmpfs), but in general if files are actually being read that often, they sit in your filesystem cache. If your working set of files is larger than your free ram, then files will be cleared out of it - but if your working set is larger than your free ram, there's no way you'll fit it all into a ramdisk either.

Check the output of the "free" command in a shell - the value in the last column, under "Cached", is how much of your free ram is being used for filesystem cache.

score 0 · Answer 13 · answered Jul 21 '09 at 07:27

0

As for your latter question, ensure that your RAM is sitting on different memory channels so that the processor can fetch the data in parallel.

answered Jul 21 '09 at 07:27

sybreon

7,357
1
19
19

score 0 · Answer 14 · answered Jul 21 '09 at 12:08

I think this might be better solved at the application level. For instance, there are probably specialized web servers for this, or you might consider mod_cache with Apache. If you have a specific goal, such as serving web content faster, then you can get improvements form this sort of thing I think.

But your question is general in nature, the Linux memory subsystem is designed to provide the best general use of RAM. If you want to target certain types of performance, consider looking up everything in /proc/sys/vm .

The fcoretools package is interesting, I'd be interested in any articles about its application... This link talks about the actual system calls used in an application.

find /var/lib/mysql | xargs fadvise -willneed (dirty, but it should provide faster access to the database files; as an example) — Andrioid, Jul 21 '09 at 18:03
Very good hack, but such hack doesn't disable a lot of waiting fsyncs from mysql :( fsyncs are needed to ensure ACID (Atomicity, Consistency, Isolation, Durability). — osgx, Feb 07 '10 at 01:48

score 0 · Answer 15 · answered Nov 09 '17 at 00:32

Not exactly what was asked, but I use

find BASE_DIRECTORY -type f -exec cat {} >/dev/null \;

to trigger initialization of files in an AWS volume created from a snapshot. It's more focused than the official recommendation of using dd if you just want to read some files.

score -1 · Answer 16 · answered Aug 03 '10 at 11:34

-1

i just tried dd if=/dev/yourrootpartition of=/dev/null \ bs=1Mcount=howmuchmemoryyouwanttofill

it does not give me the control that you desire but it at least tries to use wasted memory

answered Aug 03 '10 at 11:34

score -1 · Answer 17 · answered Aug 06 '10 at 19:19

-1

i use find / -name stringofrandomcharacter it helps alot

answered Aug 06 '10 at 19:19

user50472

1

Note that this doesn't pre-cache the file contents, only the filesystem metadata. A useful trick if you have a lot of files you're planning to poke at, but not really relevant to the original question. Also, just find / > /dev/null 2>&1 will do the same job with slightly less CPU usage. – Perkins Aug 26 '21 at 19:16

score -1 · Answer 18 · answered May 13 '13 at 11:44

-1

Sometimes I may want to cache files in a certain folder and its subfolders. I just go to this folder and execute the following:

find . -exec cp {} /dev/null \;

And those files are cached

answered May 13 '13 at 11:44

Highstaker

101
2

Caching/preloading files on Linux into RAM

18 Answers18

swappiness

vfs_cache_pressure

Linked