13

I have a java process (Glassfish) which is leaking file descriptors. I know this because I get the helpful java.io.IOException: Too many open files exception. I can look in /proc/PID#/fd and see all the open file descriptors. When I use lsof I get a very large number of entries like this:

java 18510 root 8811u sock 0,4 1576079 can't identify protocol
java 18510 root 8812u sock 0,4 1576111 can't identify protocol
java 18510 root 8813u sock 0,4 1576150 can't identify protocol

I see 12 new ones created per minute. What options can I use on lsof or what other tools are available to me to help track down socket file descriptors where the protocol can't be identified?

7ochem
  • 280
  • 1
  • 3
  • 12
cclark
  • 567
  • 2
  • 6
  • 14
  • // , A lot of great responses to this question are but a search engine query away... https://duckduckgo.com/?q=How+to+track+down+a+file+descriptor+leak – Nathan Basanese Nov 17 '16 at 06:31

3 Answers3

7

to see the top 20 file handle using processes:

for x in `ps -eF| awk '{ print $2 }'`;do echo `ls /proc/$x/fd 2> /dev/null | wc -l` $x `cat /proc/$x/cmdline 2> /dev/null`;done | sort -n -r | head -n 20

the output is in the format file handle count, pid, cmndline for process

example output

701 1216 /sbin/rsyslogd-n-c5
169 11835 postgres: spaceuser spaceschema [local] idle
164 13621 postgres: spaceuser spaceschema [local] idle
161 13622 postgres: spaceuser spaceschema [local] idle
161 13618 postgres: spaceuser spaceschema [local] idle
johnjamesmiller
  • 251
  • 3
  • 3
4

Become familiar with the strace command. It monitors system calls. I recently used it to track down file descriptor leaks that were causing our snmpd daemon to crash repeatedly. It takes some getting used to, but it's a powerful tool.

You can use strace to attach to a running process (don't forget the -f flag to follow child processes).

1

What exactly are you trying to track down? The remote IP address(es) associated with the leaked FDs, the defective code, or something else?

As you've already identified that there is a leak, contacting the engineers responsible for this java process seems like a reasonable next step.

An̲̳̳drew
  • 1,265
  • 2
  • 14
  • 19
  • I'm trying to track down any information I can about those file descriptors. `can't identify protocol` doesn't give the engineers much to run with. Are there tools or options in lsof that I'm not seeing which I should be using? The problem doesn't happen in the test env and only started in this env after a cabinet migration. The same code had no issues before the migration and when the appl is undeployed, Glassfish still leaks on its own. My best guess is something broke from a networking perspective and sockets are trying to initialize but can't and then they hang and are left around. – cclark Apr 26 '10 at 18:09