I have a situation where I need to create 100s of thousands of 0 byte lock files for concurrency control.

I've tested creating them by using:

for i in `seq 1 50000`; do touch "/run/lock/${i}.lock"; done

Since the files are 0 bytes, they don't up any space in the partition. Looking at df -h:

Filesystem      Size  Used Avail Use% Mounted on
tmpfs            50M  344K   49M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            246M     0  246M   0% /run/shm
none            100M     0  100M   0% /run/user

The 0% figure doesn't change at all in the /run/lock row.

However the memory size does increase at an average of approximately 1KB per lock file. I discovered this by comparing free -h before and after creating 70,000 lockfiles inside /run/lock. This memory increase was reflected in real memory usage (virtual memory minus the buffers/cache).

Later I discovered that this 1KB increase is most likely due to the inodes. So I checked inode usage using df -i:

Filesystem      Inodes  IUsed   IFree IUse% Mounted on
tmpfs            62729    322   62407    1% /run
none             62729  50001   12728   80% /run/lock
none             62729      1   62728    1% /run/shm
none             62729      2   62727    1% /run/user

As you can see, the lockfiles increase inodes inside the /run/lock partition.

I'm currently on Ubuntu and the /run mounts are not reflected inside /etc/fstab. Running mount gives me:

tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)

I have a couple questions regarding this (but the first one is the most important):

  1. How do I increase the inode limit permanently for /run/lock? So that this limit survives restarts?
  2. Would it be better off for me to create my own directory and mount tmpfs on it to use for this instead of using /run/lock?
  3. Is each partition's size limit completely independent from each other? That is storing files in /run doesn't seem to affect /run/lock and vice versa.
  4. Is the 1KB derived from the inode? I noticed that when creating non-empty files, the basic block is 4KB for each file.
  5. Why is /run given the filesystem type of tmpfs but /run/lock, /run/shm, /run/user give filesystem type of "none", especially since all of them are backed by TMPFS? Why aren't they all read as tmpfs in the Filesystem column?
  6. If all of the directories are independently constrained, how does the OOM killer handle in a situation where there are multiple full TMPFS partitions, each of them sized to 50% of the RAM, and where there are also processes contending for RAM as well. Obviously one cannot use over 100% of RAM. According to the https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt it mentions the system will deadlock. How does that work?
  • 335
  • 1
  • 5
  • 12
  • What is it you are doing which requires hundreds of thousands of lock files? This doesn't scale because you shouldn't have this many lock files. – Matthew Ife Aug 03 '15 at 07:49
  • Limited concurrency features in PHP forces me to use lock files. But I need a lock file for each cache object to prevent a certain form of cache stampede. – CMCDragonkai Aug 03 '15 at 07:56
  • Are you writing the caching system yourself? Are the cached objects files? – Matthew Ife Aug 03 '15 at 08:07
  • Yes, I am writing the caching system myself. The cached objects are files that must be regenerated every once and a while. While I could look for a third party caching system, I would prefer to use just the filesystem. – CMCDragonkai Aug 03 '15 at 08:11

2 Answers2


Responding to some of your question, in order:

  1. You can use mount -o remount,nr_inodes=NUM /run/lock in your application startup script (in case it's run with uid=0). It should also be safe to add relevant line to /etc/fstab, but haven't tested.
  2. Separation makes some sense here, as in case of filling up all inodes will not interfere with the rest of the system.
  3. Yes, completely independent.
  4. [...]
  5. With virtual (non-block device based) filesystems, you can put whatever as device in mount command, it's only the type that matters.
  6. [...]

Not sure if your application create empty files by opening it (and for how long), but you may also consider increasing open files limit (check ulimit), to avoid depletion.

  • 2,053
  • 1
  • 12
  • 15
  • 1
    What if the application is not started with uid 0 root? I recently discovered `lib/init/fstab`. Perhaps I can change that too? – CMCDragonkai Aug 03 '15 at 10:42
  • 1
    This comment from */lib/init/fstab* answers both these questions: *These are the filesystems that are always mounted on boot, you can override any of these by copying the appropriate line from this file into /etc/fstab and tweaking it as you see fit. See fstab(5).* – sam_pan_mariusz Aug 03 '15 at 10:48

You are going about this in the wrong direction. You can use filesystem semantics to enforce consistency.

  1. When you want to read a file just open and read it. You should always use open, never access for this operation. If you are using a PHP library to do this, check that it just calls open and not access on the file - but fopen should work fine.

  2. When you want to refresh or create a new file, you perform the following operations:-

    • Create a new file using a temporary file creation mechanism. If one does not exist - create a new filename that is unlikely to exist (filename.XXXXXX where X is replaced with random chars). Make sure to open in O_EXCL.
    • Write the relevant data into the file.
    • Rename the file the name of the old file.

This operationally is safe, because renames are defined to be atomic. A reader opening the file will see either the old file, or the new file - but never a non-existant file in the cache.

In the worst case with many concurrent checks of each file a number of writers will overwrite one another briefly. But this is way - way cheaper than using a file lock against each file.

Alternatively, rather than having a lock file for each file - consider actually just locking each individual cache object directly. I still dont think that this would scale however.

Using rename and link semantics in this case guarantees consistency with your cache and is way way cheaper to manage than lock files.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
  • Thanks for the advice, but I don't think it solves my particular cache stampede problem. You're proposing that reads read from old files, and writes write to a new file. That is a good idea. However I have multiple threads attempting to read the same cache object. If the object has been expired, those threads will attempt to regenerate the cache object. I need to prevent multiple threads from regenerating at the same time, as the regeneration is expensive. I don't see how your renaming pattern will communicate to other threads that a particular thread is currently regenerating. Although... – CMCDragonkai Aug 03 '15 at 08:36
  • ...this is an XY answer, I am interested specifically in the answers to the questions I posted, because the answers will help me beyond just this particular situation that I described in my OP. – CMCDragonkai Aug 03 '15 at 08:37