29

I need to setup an in memory storage system for around 10 GB of data, consisting of many 100 kb single files(images). There will be lots of reads and fairly periodic writes(adding new files, deleting some old ones).
Now, I know that tmpfs behaves like a regular file system for which you can, for example, check free/used space with df, which is a nice feature to have. However I'm interested if ramfs would offer some advantages with regards to speed of IO operations. I know that I can not control the size of consumed memory when using ramfs and that my system can hang if it completely consumes the free RAM, but that will not be an issue in this scenario.

To sum it up, I'm interested:
- Performance wise, which is faster: ramfs or tmpfs(and possibly why)?
- When does tmpfs use swap space? Does it move already saved data to swap(to free RAM for other programs currently running) or only new data if at that moment there is no free RAM left?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Ivan Kovacevic
  • 1,671
  • 3
  • 14
  • 19

4 Answers4

31

My recommendation:

Measure and observe real-life activity under normal conditions.

Those files are unlikely to be ALL be needed and served from cache at all times. But there's a nice tool called vmtouch that can tell you what's in cache at a given moment. You can also use it to lock certain directories or files into cache. So see what things look like after some regular use. Using tmpfs and ramfs are not necessary for this situation.

See: http://hoytech.com/vmtouch/

I think you'll be surprised to see that the most active files will probably be resident in cache already.


As far as tmpfs versus ramfs, there's no appreciable performance difference. There are operational differences. A real-life use case is Oracle, where ramfs was used to allow Oracle to manage data in RAM without the risk of it being swapped. tmpfs data can be swapped-out under memory pressure. There are also differences in resizing and modifying settings on the fly.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 3
    Awesome little utility! +1 – Janne Pikkarainen Apr 24 '14 at 11:48
  • 1
    @ewwhite Excellent reply. In one of our case a few years back we did indeed found out that the files mostly used are resident in cache already. Hint: File systems nowdays are much more intelligent than one my think. – giannisapi Sep 01 '16 at 12:58
13

Don't over-think this. Put enough RAM in your system and let the kernel's disk cache take care of things for you. That way you get the benefit of reads coming directly from memory, while still being able to persist data on disk.

EEAA
  • 108,414
  • 18
  • 172
  • 242
  • 1
    My system currently has 16 GB of RAM. Its a plain Debian install running Nginx to serve those images. I have a 1 Gbit network connection which will be under 100% load all the time, serving those images at no particular order. Do you think that the kernel will load all those 10 gigs of images anyway in the cache in this scenario? – Ivan Kovacevic Apr 19 '14 at 17:53
  • 5
    Yes, if there is sufficient RAM in the system, and other applications on the server are not competing for RAM resources, those files will remain in cache. – EEAA Apr 19 '14 at 18:14
  • In that case you are probably right that this is unnecessary in my case. However I'm gonna leave the question open for a while still, just for the sake of argument ramfs vs tmpfs, in the scenario where RAM storage would be useful... I suppose such scenario exists?! – Ivan Kovacevic Apr 19 '14 at 18:20
  • 3
    I've been at unix administration for ~15 years, and I've *never* run into a situation where tmpfs/ramfs would have provided *any* benefit over the native kernel fs cache. That's not to say that situations don't exist out there where they would be warranted, but they're quite rare. Typically if you need RAM cache for things, one uses a purpose-build caching layer (Redis/Memcache/etc.). – EEAA Apr 19 '14 at 18:22
  • Definitely nice to have your input on this, thanks! Then tmpfs "boosting" stories around the internet are essentially an urban myth. – Ivan Kovacevic Apr 19 '14 at 18:26
  • Maybe. Maybe not. Like I said, there *may* be circumstances where they're of benefit. With your situation, though, that's likely not the case. – EEAA Apr 19 '14 at 18:31
  • 7
    Disk caching will certainly work for the case where the images need to be read, but tmpfs or ramfs could still be useful if you wish to speed up a lot of random/small writes but are bound to a disk that is slow with random I/O. Do keep in mind that if the machine crashes or suffers a power failure, the contents of tmpfs will be gone since they where (only) in memory. – Martijn Apr 22 '14 at 22:19
  • 2
    @Martijn is right. tmpfs and ramfs are indeed useful. For example, I am doing an intensive rewrite (filter-branch) of a git repository. Doing it in memory is gobs faster than doing it on my SSD. Caching helps with reads not writes, since (normally) Linux has to meet some guarantees about the permanence of disk operations. – Paul Draper Jan 24 '15 at 17:42
8

1) Performance benchmark.

Using this page as a reference, I did I/O comparison between tmpfs and ramfs, and the results are that it is pretty much identical in terms of performance:

# !mount
mount | grep -E "tmp|ram"
tmpfs on /dev/shm type tmpfs (rw)
ramfs on /mnt/ram type ramfs (rw,size=1G)

# dd bs=1M count=1024 if=/dev/zero of=/dev/shm/test conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.634054 s, 1.7 GB/s

# dd bs=1M count=1024 if=/dev/zero of=/mnt/ram/test conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.559557 s, 1.9 GB/s

# dd bs=1M count=4096 if=/dev/zero of=/dev/shm/test conv=fdatasync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 2.5104 s, 1.7 GB/s

# dd bs=1M count=4096 if=/dev/zero of=/mnt/ram/test conv=fdatasync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 2.36923 s, 1.8 GB/s

2) According to this page, tmpfs uses swap, and ramfs does not use swap.

Michael Martinez
  • 2,543
  • 3
  • 20
  • 31
  • 3
    Your answer is on the right track. However I would not agree with your conclusion regarding performance, your tests show that there ARE differences of 0.2 GB/s and 0.1 GB/s in favour of ramfs. I believe this should be tested even further to provide a valid statistical sample. Regarding 2) Yes that is known, however I wished I could get a better insight of exactly when is swap used. – Ivan Kovacevic Apr 24 '14 at 12:49
  • if we did this benchmark a bunch of times with different size files, I don't think we'll see a difference. you'll notice that when I bumped the size up four times, the difference actually narrowed rather than widening. – Michael Martinez Apr 24 '14 at 16:52
  • 2
    What about the case where you have a bunch of small files. For example writing million 100-200 kb files. Also do you get 0.2 GB/s difference repeatedly for the same file size? Which would definitely point out to performance difference. I will probably test this myself when it's on my schedule. But that is why I asked here, so I could maybe cross it from the to-do list if anyone else already did it. – Ivan Kovacevic Apr 24 '14 at 17:27
  • yeah the only way to know for sure is to do the tests. – Michael Martinez Apr 24 '14 at 17:35
  • @MichaelMartinez what is important for such databases is **latency**. ramfs by being less featured and using directly the underlying space cache might offer better performance. And yes, I’m in a situation where putting the RocksDb in ram compared to slc ssd isn’t enough. – user2284570 Mar 27 '22 at 18:32
  • @user2284570 dd gives an accurate benchmark for latency regardless of what you intend to use the host for - database or otherwise. – Michael Martinez Mar 29 '22 at 00:28
  • @MichaelMartinez bandwidth is not always latency. – user2284570 Mar 29 '22 at 00:58
  • @user2284570 dd output reports both latency and bandwidth numbers – Michael Martinez Mar 30 '22 at 01:32
  • @MichaelMartinez it mostly measures bandwidth with a minimum number of request on a given timeframe (so mostly bandwidth). Bandwidth with dd is computed using the time to peform the copy so if you care only about bandwith, you wont get for example the latency between 2 ram modules having the same MHz. – user2284570 Mar 30 '22 at 20:59
  • It doesn’t tells if ramfs is possibly far faster is terms of access request than tmpfs. – user2284570 Mar 30 '22 at 21:00
3

If you have a sufficient amount of RAM installed to host the various kernel buffers, the applications stack and heaps, the regular file system cache and all the files you intent to put in it, ramfs should never be slower than tmpfs as there will be no risk of physical I/O by design. Physical I/Os are undoubtedly the main cause of performance degradation in that area.

However, if you have not that amount of RAM installed, using ramfs might and probably will be slower than tmpfs as the latter is using the virtual memory heuristic to decide what should better be on disk (i.e. in the swap area) vs what should be on RAM while with tmpfs, your file system data is stuck on RAM which might be a waste of resource.

To answer you second question, yes, tmpfs will move old data first to the swap area, not the last "hot" one.

jlliagre
  • 8,691
  • 16
  • 36