Why do strace/truss sometimes 'fix' stuck processes?

4

1

Sometimes you have a stuck process that's been stuck for a while, and as soon as you go to poke at it with strace/truss just to see what's going on, it gets magically unstuck and continues to run! So from merely 'observing' these programs have some impact in the running of the stuck programs .. what's happening here? Did strace (I guess via ptrace(2)?) send a signal, causing the program to cease blocking, or such?

I've seen this several times -- most recently on Linux RHEL 4 (and a Perl script mucking with processes and doing some network IO in that case), but in a few other contexts as well. Unfortunately, I can't reproduce this, as it times to happen ... in times of crisis. But my curiosity remains. :-)

Any elucidation appreciated.

Emmel

Posted 2010-04-23T05:48:53.340

Reputation: 351

Answers

0

May be it is a bug either in kernel or in program you are tracing?

The program may have incorrectly implemented event loop that is waits for wrong thigs, but waits for other things after EINTR.

Example:

for(;;) {
  select(...);
  if(FD_SET(...i...)) {
    read(...i...);
    write(...j...); // Naive blocking write
  }
}

It will work in trivial test, but the whole program may block if any write blocks.

Suspending/resuming the program aborts blocking write and causes the main loop to continue.

Vi.

Posted 2010-04-23T05:48:53.340

Reputation: 13 705