1

I'm encountering unfamiliar Apache symptoms and I'm curious if anyone here knows how to diagnose them. I've got a pair of app servers running mod_python and Apache, recently upgraded to Django 1.2.3. They plug into a db server that runs PostGIS and memcached.

Here's what I'm seeing in 'top':

  • The app servers httpd processes climb to the low 20s.

  • The app servers CPU's %wa, which, in the past, had been almost always near zero, starts dancing around %50.

I restart apache, the problems go away. It's only recurred once so far, but I'm worried it might, and I'm curious to get to the bottom of it. Anyone seen this before? Know the smart way to deal with it? I'm planning on trying to closely the examine io operations if it crops again, but don't have a good grip on it.

palewire
  • 11
  • 1

1 Answers1

0

Use strace -T -f -p 1154 where 1154 is the process ID of the offending process. Then use grep and sed/awk and lsof to try and sort out which system calls are taking a long time. You will likely find that a variant of read() or write() against a particular file is taking a long time. You should try inspecting the list of open files first with lsof to get the file descriptor (e.g. 5) and then search for read(5, and inspect the number at the end (e.g. <0.00056>). The larger this number, the more you need to investigate the device the file is on, which is why lsof is so handy.

By the way, on some systems I have to issue a SIGCONT against the process and it's children because strace issued a SIGSTOP. Type as root: cd /proc/1154/task; kill -CONT *; cd /

zerolagtime
  • 1,418
  • 9
  • 10