non-cpu-intensive alternative to lsof?

Question

We run an Apache Cassandra cluster where each host has a few hundred thousand files open at any given time.

We'd like to be able to get a count of open files at periodic intervals and feed this number into graphite, but when we run lsof under collectd, it ends up taking a few minutes to complete and chewing up an inordinate amount of CPU in the meantime.

I'm wondering if there's an alternate and more friendly means of getting the same data that lsof provides, or even a way of running lsof that won't eat into CPU as noticeably? (Although I assume this latter method would likely take much longer to complete than it currently does... not ideal).

Perhaps the kernel maintains some variable somewhere that contains the number of open files? Wishful thinking?

Update:

In reponse to one of the answers, we're already using the -b and -n flags. Here's the full command as I have it running under collectd:

sudo lsof -b -n -w | stdbuf -i0 -o0 -e0 wc -l

score 12 · Answer 1 · answered Apr 08 '17 at 04:43

12

You probably don't need to resolve the network addresses for socket, so a least use the -n switch. Then you may also want so skip blocking operations with -b.

These 2 first switches should really make it faster.

And then -l to avoid resolving uids. And -L to avoid counting links. Etc. See the man lsof .

Alternatively, with Linux, you could make a script to simply count the links under /proc/<PID>/fd like this:

find /proc -mindepth 3 -maxdepth 3 -type l | awk -F/ '$4 == "fd" { s++ } END { print s }'

answered Apr 08 '17 at 04:43

Benoît

1,331
3
11
23

I alway get - find: `/proc/{{number}}/fd/5': No such file or directory find: `/proc/{{number}}/fdinfo/5': No such file or directory - Q @Benoît how can I avoid that? – BGBRUNO Apr 08 '17 at 06:48
2

@BrunoBG : try : `echo /proc/*/fd/* | wc -w` – Olivier Dulac Apr 08 '17 at 06:53
Thx @OlivierDulac that was an obvious one :-) – BGBRUNO Apr 08 '17 at 06:57
good suggestions, but already have been using the -n and -b options .... I need more suggestions – Michael Martinez Apr 08 '17 at 17:18
if you like python you can use psutil or one of the other solutions as mentioned in this answer http://stackoverflow.com/questions/2023608/check-what-files-are-open-in-python – Matt Apr 08 '17 at 17:28
1

@OlivierDulac may not work if you have a very large number of fd. – Benoît Apr 09 '17 at 04:50
@Benoît true, and not as efficient as find. – Olivier Dulac Apr 09 '17 at 08:23
@OlivierDulac guys, keep in mind that fds are not the only type of open files. in my case, the vast majority of open files reported by lsof are memory mapped, they don't have an FD associated with them. – Michael Martinez Apr 09 '17 at 15:40
@MichaelMartinez I was just replying to Bruno BG's "how can i avoid that" – Olivier Dulac Apr 09 '17 at 15:42

score 5 · Answer 2 · answered Apr 08 '17 at 20:06

You're doing it wrong.

From man proc

   /proc/sys/fs/file-nr
This (read-only) file contains three numbers: the number of allocated file handles (i.e., the number of files presently opened); the number of free file handles; and the maximum number of file handles (i.e., the same value as /proc/sys/fs/file-max). If the number of allocated file handles is close to the maximum, you should consider increasing the maximum. Before Linux 2.6, the kernel allocated file handles dynamically, but it didn't free them again. Instead the free file handles were kept in a list for reallocation; the "free file handles" value indicates the size of that list. A large number of free file handles indicates that there was a past peak in the usage of open file handles. Since Linux 2.6, the kernel does deallocate freed file handles, and the "free file handles" value is always zero.

The first value if you cat that gives you precisely what you are after it would appear.

For the record I couldn't get the lsof output to match it even with some amount of fudging but I gather if thats what the kernel says its more authoritative than the list you get from lsof anyway.

Here is my lsof output: `[root@ec2- cassandra101 ~]$ time lsof -b -n -w -l -L | stdbuf -i0 -o0 -e0 wc -l 1018065`. Here is what file-nr says: `[root@ec2- cassandra101 ~]$ cat /proc/sys/fs/file-nr 2784 0 3093428`. The large discrepansy (1,000,000+ versus 2784) is due to the fact that `lsof` includes all things that do not have a file-descriptor associated with them: library files, excecutables, etc. So, if you're only interested in file descriptors, then `file-nr` is the way to go, otherwise you need lsof or equivalent. — Michael Martinez, Apr 08 '17 at 20:26

non-cpu-intensive alternative to lsof?

2 Answers2