Debugging connection timeouts using strace?

6

2

I'm trying to figure out why is Eclipse spinning, so I decided to fire up strace.

I found the eclipse process using:

$ ps ax | grep java
 5546 ?        Sl    19:04 /usr/bin/java ... [arguments omitted]

By running strace on this process, I see that it's waiting on another process:

$ sudo strace -p 5546
Process 5546 attached - interrupt to quit
futex(0x7f6c416679d0, FUTEX_WAIT, 5547, NULL^C <unfinished ...>
Process 5546 detached

Interestingly, process 5547 doesn't appear in ps (can anyone tell me why?), but I can strace it. It is repeatedly spitting out a lot of EAGAIN failures (with the occasional success)

read(16, 0x7f6c41664d10, 16)            = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15, 0x7f6c3815f2e4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {67410, 357843264}) = 0
poll([{fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=68, events=POLLIN}, {fd=128, events=POLLIN}, {fd=69, events=POLLIN}], 5, 0) = 0 (Timeout)
read(16, 0x7f6c41664cb0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15, 0x7f6c3815f2e4, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
...

Judging from the output, it looks like it is polling the file descriptors 16, 15, 68, 128 and 69. Particularly, the EAGAIN errors are coming from fds 15 and 6, as seen from the read(2) and recvfrom(2) calls.

How do I find more information on those fds? I tried lsof -p 5547 but no output is printed. I suspect these are sockets that are open to some website, but why it is spinning in a tight loop with EAGAIN failures is puzzling...

jabalsad

Posted 2012-06-13T11:33:05.207

Reputation: 1 137

Answers

3

Some PIDs are not listed because they belong to threads. htop can show them if you press Shift+H (and optionally T for tree view), but lsof wants the PID of the main process. (All pthreads in a process share file descriptors, anyway.) You can also take a look in /proc/5546/fd/ and /proc/5546/task/.

EAGAIN is normal for non-blocking I/O; for example, it is returned by read() when there's no data to read. See "ERRORS" in read(2), write(2) and so on. Some of these fd's are likely connections to the X11 server – non-blocking I/O is used by the X11 client libraries.

user1686

Posted 2012-06-13T11:33:05.207

Reputation: 283 655

Thanks, I have now become a tiny bit better at debugging :-) – jabalsad – 2012-06-13T13:32:36.277

Also, wouldn't it be less resource intensive if the sockets use callbacks to know when data has arrived? – jabalsad – 2012-06-13T13:34:00.673

@jabalsad: poll() is already used for that. I don't know, though, why it returns {fd=16, events=POLLIN} if there's nothing to read() from it.. – user1686 – 2012-06-13T13:34:33.590

Do you mean that when poll() is called, the application should know that data is available for read or not?. If so, why is a read() still performed when there is no data available? – jabalsad – 2012-06-13T13:36:32.153

Nevermind, your added comment answered my question. – jabalsad – 2012-06-13T13:37:35.427

@jabalsad: Ah, I just noticed a few things. One, the fd's displayed in strace output are actually input to poll(), not its output (as would be the case with select()), so naturally strace shows all polled fd's. Second, poll() returns 0, meaning "timeout, no events", so the fd's really don't have any new data; the library probably just fails to check the return code properly. – user1686 – 2012-06-13T13:47:44.200

2

The third parameter to futex(2) is not necessarily a process ID, the manual says it's futex(uaddr, op, val, timeout, ...), and that if op is FUTEX_WAIT, it "atomically verifies that the futex address uaddr still contains the value val, and sleeps awaiting FUTEX_WAKE on this futex address. [When] timeout is NULL, the call blocks indefinitely."

In your case, it sounds like the process 5546 was sitting there waiting for something -- presumably its thread 5547 or something related, but we can't know exactly what for sure based only on this -- to write that number at the specified memory address.

Note also that you can see threads by using ps -eLf or similar.

Josip Rodin

Posted 2012-06-13T11:33:05.207

Reputation: 333