16

Our server recently ran out of file descriptors, and in regards to that I have some questions. ulimit -n is supposed to give me the maximum number of open file descriptors. That number is 1024. I checked the number of open file descriptors by running lsof -u root |wc -l and got 2500 fds. That is a lot more than 1024, so I guessed that would mean the number 1024 is per process, not per user, as I though. Well, I ran lsof -p$PidOfGlassfish|wc -l and got 1300. This is the part I don't get. If ulimit -n is not the maximum number of processes per user or per process, then what is it good for? Does it not apply to the root user? And if so, how could I then get the error messages about running out of file descriptor?

EDIT: The only way I can make sense out of ulimit -n is if it applies the the number of open files (as stated in the bash manual) rather than the number of file handles (different processes can open the same file). If this is the case, then simply listing the number of open files (grepping on '/', thus excluding memory mapped files) is not sufficent:

lsof -u root |grep /|sort  -k9  |wc -l #prints '1738'

To actually see the number of open files, I would need to filter on the name column on only print the unique entries. Thus the following is probably more correct:

lsof -u root |grep /|sort  -k9 -u |wc -l #prints '604'

The command above expects output on the following format from lsof:

java      32008 root  mem       REG                8,2 11942368      72721 /usr/lib64/locale/locale-archive
vmtoolsd   4764 root  mem       REG                8,2    18624     106432 /usr/lib64/open-vm-tools/plugins/vmsvc/libguestInfo.so

This at least gives me number less than 1024 (the number reported by ulimit -n), so this seems like a step in the right direction. "Unfortunately" I am not experiencing any problems with running out of file descriptors, so I will have a hard time validating this.

oligofren
  • 601
  • 2
  • 8
  • 21
  • 2
    lsof reports memory mappings as well as open files, so your 'wc' pipeline yields an over-estimate of the number of file descriptors used by that process. – Richard Kettlewell Jun 09 '12 at 11:52
  • aha! now that is good info. But I am not quite sure I understand. By "memory mappings", you mean a memory mapped file? That would require a file handle to my understanding, or how else would the OS be able to update the file? – oligofren Jun 11 '12 at 16:11
  • And followup two: What would be a good way of finding all open file handles - the ones that are actually affected by the limits imposed by "ulimit -n"? – oligofren Jun 11 '12 at 16:12
  • 1
    Memory mappings don’t require an open file. If you want to list open files only, filtering the output of lsof is probably the easiest approach. – Richard Kettlewell Jul 07 '12 at 13:49
  • Thanks, edited my answer. Using ´lsof -u root |grep /|sort -k9 -u´ seems to give what amounts to a reasonable answer. This is at least a number less than ulimit -n. – oligofren Jul 16 '12 at 14:11

6 Answers6

11

I tested this in Linux version 2.6.18-164.el5 - Red Hat 4.1.2-46. I could see that the ulimit is applied per process.

The parameter is set at user level, but applied for each process.

Eg: 1024 was the limit. Multiple processes were started and the files open by each one was counted using

ls -l /proc/--$pid--/fd/ | wc -l

There were no errors when the sum of files opened by multiple processes crossed 1024. I also verified the unique file count combining the results for different processes and counting unique files. The errors started appearing only when the count for each process crossed 1024. ( java.net.SocketException: Too many open files in process logs )

Scott Pack
  • 14,717
  • 10
  • 51
  • 83
Chosen
  • 111
  • 1
  • 3
  • Thanks for testing this out. Do you have any idea why `lsof -p$PidOfGlassfish|wc -l` gave me 1300? I am guessing the two approaches to counting differ somehow. If not, then maybe the limit does not apply to the root user? – oligofren May 21 '14 at 07:52
  • Just curious, why use `ls -l` instead of `ls`? The latter has an extra line (e.g. `total 5`) when there are 5 files. In such case using `ls -l `in the above example would report 6 not 5. I use `ls /proc//fd | wc -l`. – starfry Mar 06 '19 at 11:14
  • @starfry That's just sloppiness on my part. I usually do this stepwise, and `ls -l` gives me one entry per line, which I then pipe into something else. Of cours, this also happens when piping normal `ls` (but no otherwise). – oligofren Apr 30 '19 at 08:48
5

The ulimit is for filehandles. It applies to files, directories, sockets, pipes epolls, eventfds, timerfds etc etc.

At any point during the processes startup the limits might have been changed. Visit /proc/<pid>/limits and see if the values have been altered.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
5

@oligofren

I also carried out some testing to determine how "ulimits -Sn" for "open files" was enforced.

  • Like the poster Chosen mentioned in link, the ulimit for "open files" is indeed applied per process. To see what the process's current limits are:

    cat /proc/__process_id__/limits

  • To determine how many files a process has open, you need to use the following command:

    lsof -P -M -l -n -d '^cwd,^err,^ltx,^mem,^mmap,^pd,^rtd,^txt' -p __process_id__ -a | awk '{if (NR>1) print}' | wc -l

Explanation of the above and my testing method / results

The "-P -M -l -n" arguments to lsof are simply there to make lsof operate as fast as possible. Feel free to take them out.

-P - inhibits the conversion of port numbers to port names for network files
-M - disable reporting of portmapper registrations for local TCP, UDP and UDPLITE ports
-l - inhibits the conversion of user ID numbers to login names
-n - inhibits the conversion of network numbers to host names for network files

The "-d '^cwd,^err,^ltx,^mem,^mmap,^pd,^rtd,^txt'" argument instructs lsof to exclude file descriptors of type: cwd/err/ltx/mem/mmap/pd/rtd/txt.

From lsof man page:

   FD         is the File Descriptor number of the file or:

                   cwd  current working directory;
                   Lnn  library references (AIX);
                   err  FD information error (see NAME column);
                   jld  jail directory (FreeBSD);
                   ltx  shared library text (code and data);
                   Mxx  hex memory-mapped type number xx.
                   m86  DOS Merge mapped file;
                   mem  memory-mapped file;
                   mmap memory-mapped device;
                   pd   parent directory;
                   rtd  root directory;
                   tr   kernel trace file (OpenBSD);
                   txt  program text (code and data);
                   v86  VP/ix mapped file;

I deemed "Lnn,jld,m86,tr,v86" as not applicable to Linux and hence didn't bother to add them to the exclusion list. I'm not sure about "Mxx".

If your application makes use of memory mapped files/devices then you may want to remove "^mem" and "^mmap" from the exclusion list.

EDIT ---begin snip---

Edit: I found the following link which indicates that:

memory mapped .so-files technically aren't the same as a file handle the application has control over. /proc//fd is the measuring point for open file descriptors

So if your process does use memory mapped files, you will need to filter out *.so files.

Also, Sun's JVM will memory map jar files

A memory-mapped JARfile, in this case the file that holds the "JDK classes." When you memory-map a JAR, you can access the files within it very efficiently (versus reading it from the start each time). The Sun JVM will memory-map all JARs on the classpath; if your application code needs to access a JAR, you can also memory-map it.

So things like tomcat/glassfish will also show memory mapped jar files. I've not tested whether these count towards the "ulimit -Sn" limit.

EDIT ---end snip---

Empirically, I've found that "cwd,rtd,txt" are not counted with regards to the per process file limit (ulimit -Sn).

I'm not sure whether "err,ltx,pd" are counted towards the file limit as I don't know how to create file handles of these descriptor types.

The "-p __process_id__" argument restricts lsof to only return information for the __process_id__ specified. Remove this if you want to get a count for all processes.

The "-a" argument is used to AND the selections (i.e. the "-p" and "-d" arguments).

The "awk '{if (NR>1) print}'" statement is used to skip the header that lsof prints in its output.

I tested using the following perl script:

File: test.pl
---snip---
#!/usr/bin/perl -w
foreach $i (1..1100) {
  $FH="FH${i}";
  open ($FH,'>',"/tmp/Test${i}.log") || die "$!";
  print $FH "$i\n";
}
---snip---

I had to execute the script in the perl debugger to ensure the script doesn't terminate and release the file descriptors.

To execute: perl -d test.pl

In perl's debugger, you can run the program by entering c and pressing enter and if your ulimit -Sn had a value of 1024, you'll find that the program stops after creating the Test1017.log file in /tmp.

If you now identify the pid of the perl process and use the above lsof command you will see that it also outputs 1024.

Remove the "wc -l" and replace with a "less" to see the list of files that counted towards the 1024 limit. Remove the "-d ^....." argument as well to see that the cwd,txt and rtd descriptors didn't count towards the limit.

If you now run "ls -l /proc/__process_id__/fd/ | wc -l", you will see a value of 1025 returned. This is because ls added a "total 0" header to its output which got counted.

Note:

To check whether the OS is running out of file descriptors, it is better to compare the value of:

cat /proc/sys/fs/file-nr | awk '{print $1}'

with

cat /proc/sys/fs/file-max

https://www.kernel.org/doc/Documentation/sysctl/fs.txt documents what file-nr and file-max mean.

Jinesh Choksi
  • 151
  • 1
  • 3
0

You want to take a look at the system-wide limits set in /proc/sys/fs/file-max and adjust it there (until next reboot) or set fs.file-max in sysctl.conf to make it permanent. This might be helpful - http://www.randombugs.com/linux/tuning-file-descriptors-limits-on-linux.html

rnxrx
  • 8,103
  • 3
  • 20
  • 30
  • 1
    That comment about bash isn't accurate. ulimit imposes a per user-id set of limits, for processes initiated via the shell, which is essentially virtually everything thanks to how the process tree is spawned on Unix like operating systems. It's not bash. – EightBitTony Jun 08 '12 at 12:39
  • Sorry - will edit, but comment about system wide limits still stands, though. – rnxrx Jun 08 '12 at 15:07
  • It's very unlikely that he's hitting the system wide limits. Possible, but *very* unlikely. – David Schwartz Jun 08 '12 at 20:24
  • EightBitTony: ulimit does not set ulimit per user-id set of limits. Its per process when the pam_limits are applied. The ulimit that that is "per user" is the "ulimit -u" "The maximum number of processes available to a single user" – No Username May 21 '14 at 04:43
0

It seems like your reasoning is something like, "I have to lower that limit so I don't run out of precious descriptors". The truth is exactly the reverse -- if your server ran out of file descriptors, you need to raise that limit from 1,024 to something larger. For a realistic glassfish implementation, 32,768 is reasonable.

Personally, I always raise the limit to around 8,192 system-wide -- 1,024 is just ridiculous. But you'll want to raise glassfish higher. Check /etc/security/limits.conf. You can add a special entry for the user glassfish runs as.

David Schwartz
  • 31,215
  • 2
  • 53
  • 82
  • I am not sure how you could interpret me to mean that :-) What I was wondering is why it did not seem to apply. I will set it higher, but I want to understand how it works as well. If the limit is 1024, then how could Glassfish have 1300 handles? – oligofren Jun 11 '12 at 16:08
  • 'lsof -u root |grep /|sort -k9 -u' prints the unique file descriptor entries. I guess the number of lines from this is the actual number ulimit -n applies to. – oligofren Jul 16 '12 at 14:15
0

Common mistake to compare result of raw lsof call with supposed limit.

For the global limit (/proc/sys/fs/file-max) you should have a look at /proc/sys/fs/file-nr -> the fist value indicates what is used and the last value is the limit

The OpenFile limit is for each process but the can be defined on a user, see command "ulimit -Hn" for user limits and see /etc/security/limits.conf for definitions. Generally applyed with "app user" eg:"tomcat": set limit to 65000 to user tomcat that will apply on java process it runs.

If you want to check limit applied on a process, get its PID and then : cat /proc/${PID}/limits If you want to check how many files are opened by a process, get its PID and then : ls -1 /proc/{PID}/fd | wc -l (note for ls it's 'minus one', not to confond with 'minus el')

If you want to know details with lsof but only for those file handers that count for the limit, have a try with thoses : lsof -p ${PID} | grep -P "^(\w+\s+){3}\d+\D+" lsof -p ${PID} -d '^cwd,^err,^ltx,^mem,^mmap,^pd,^rtd,^txt' -a

Remark : the 'files' are files / pipe / tcp connections / etc.

Note that sometimes you'll probably need to be root or to use sudo to obtain correct result for the commands, without privilege sometimes you don't have error, just less results.

and finally if you want to know what 'files' on your filesystem are accessed by a process, have a look at : lsof -p {PID} | grep / | awk '{print $9}' | sort | uniq

have fun !

Ronan Kerdudou
  • 141
  • 1
  • 4