We run an Apache Cassandra cluster where each host has a few hundred thousand files open at any given time.
We'd like to be able to get a count of open files at periodic intervals and feed this number into graphite, but when we run lsof
under collectd
, it ends up taking a few minutes to complete and chewing up an inordinate amount of CPU in the meantime.
I'm wondering if there's an alternate and more friendly means of getting the same data that lsof provides, or even a way of running lsof that won't eat into CPU as noticeably? (Although I assume this latter method would likely take much longer to complete than it currently does... not ideal).
Perhaps the kernel maintains some variable somewhere that contains the number of open files? Wishful thinking?
Update:
In reponse to one of the answers, we're already using the -b
and -n
flags. Here's the full command as I have it running under collectd
:
sudo lsof -b -n -w | stdbuf -i0 -o0 -e0 wc -l