4

The "spoon-feeding problem", as it was recently explained to me, happens when connections to your application server are tied up feeding data across slow network connections to your clients. This makes sense to me and now I understand the importance of putting a highly-concurrent proxy in front of my app servers.

My question is, how did the first person to recognize this problem figure it out? What *nix tools and troubleshooting techniques would help me to recognize this problem if I hadn't had it explained to me?

3 Answers3

2

Depending on the architecture, you could also see low CPU user-space CPU utilization, but a higher load due to processes sitting on the wait queue blocking within kernel space network IO routines. If you're running a thread pool system, lots of times you'll have requests denied or queued with low CPU usage, again IO wait counters will be high.

Sometimes creating additional threads/workers alleviates the issue temporarily until your system reaches another critical mass.

In all honesty, it looks a lot like what you'll run into when you have a slow NFS server.

McJeff
  • 2,019
  • 13
  • 11
  • To answer the second half... vmstat, iostat to show CPU time vs. IO Wait. Anything that reports process queue statistics and uptime averages. Also looking at netstat connections with data consistently in a send queue. – McJeff Mar 19 '10 at 17:43
  • Thanks, this last bit is really what I was looking for. Specific utilities to pinpoint exact symptoms. – Don Spaulding Apr 06 '10 at 18:53
1

If you are using Apache the typical symptom is a lot of connections in a 'W' state when looking at server-status page.

Aleksandar Ivanisevic
  • 3,327
  • 19
  • 24
  • This is another good tip, thanks. I haven't used the server-status page much in the past. I'll look to it next time I need to debug Apache connection issues. – Don Spaulding Apr 06 '10 at 18:54
0

It is not a *nix tool, but I found managed switch port mirroring in combination with Wireshark or Wildpackets to be useful. Using filters one can compare the speed of similar transactions that run quickly or not.

kmarsh
  • 3,103
  • 15
  • 22
  • I'm not sure this would work. In the scenario I'm describing, people are unable to connect to the server because it's busy waiting to send data to slow clients. As an admin, I'm not trying to solve "Why does this transaction run slower than that one?", the problem I'm tasked with is "Why are some customers getting Connection Refused messages?". What makes the whole thing so problematic is that the cause of the problem is unrelated to the people experiencing symptoms. I've had to solve problems like this before, and want to be better equipped next time I run into a problem like this. – Don Spaulding Apr 06 '10 at 19:00
  • Connection Refused is a snoopable ICMP packet. If you can automate testing by generating connections at will, you can use the data tap/wireshark technique to find thresholds. – kmarsh Apr 07 '10 at 12:45