1

QUESTION: Remote Connections Timing Out to SSH/HTTP in Parallel from Multiple Locations, but NOT PING, why?

ISSUE: Have a web server that has remote connections timing out around 50% of the time in parallel for ssh/http connections, but NOT ping. The downtime/uptime is irregular and are in durations of 5-20 minute periods. I've run checks through two distributed monitoring services, and the logs validate what I'm seeing locally. Issue has been going on for 4-5 days, 24/7.

POSSIBLE QUESTIONS:

    * What are test I should run from the server on resources?
    * What are test I should run to log outbound connections from the server?
    * What are test I should run remotely?
    * What are keywords or phrases I should Google?
    * What are other questions I should ask?
    * What additional information should I provide?

ACCESS:

    * I have remote ADMIN access to the server, but not physical access.

SYSTEM:

    * Linux-CentOS-5.X, Apache-2.X
    * Unknown Virtual Machine, but other systems on the same network are not having remote connection issues

NETWORK:

    * Network information unknown, but other systems on the same network are not having remote connection issues

Thanks in advance!!!

_________________

RECENT UPDATES (1): "Does your server still respond to ping during these outages?" @Greg - Yes, ping IS working... :-) ...but ssh/http are down in parallel during these outages. Also, all test are done via a static IP. IP address was owned within an IP block for years, but just assigned to the server.

RECENT UPDATES (2): PINGs from me appear to keep the ssh/http connection down. Doing an automated set of 10x PINGs every 5-minutes from a distributed network of computers. Leaving the PINGs on for the next 10-hours to see if ssh/http stay down; which would be a new pattern.

RECENT UPDATES (3): So, far as SUDO user I'm unable to view "/var/log/messages" or Apache logs. No other logs were attempted to be viewed.

blunders
  • 793
  • 6
  • 13
  • 29
  • Does your server still respond to ping during these outages? – Greg Oct 09 '10 at 22:53
  • @Greg - Yes, ping IS working... :-) ...but ssh/http are down in parallel during these outages. Also, all test are done via a static IP. IP address was owned within an IP block for years, but just assigned to the server. – blunders Oct 09 '10 at 23:37

3 Answers3

1

"ping" is only testing up to Layer 3/4 of the TCP/IP stack where SSH & HTTP are actually applications running throughout the 7 layers. The applications can be malfunctioning or overloaded while the TCP/IP stack continues to be functional. With that said, some possible areas to check include:

  • Logging for of the applications for connections (maybe reoccurring from one of more remotes)
  • Utilization of the applications and its supporting/helper applications
  • Stopping and restarting the applications (if possible/practical) while comparing logging conditions/results
  • Checking firewall logs
  • Running AWSTATS to possibly spot HTTP trends which may coincide with the timeout events or logging

Do both HTTP & SSH timeouts happen in parallel? If so, matching the logs may also yield additional hints as to possible events/activities in common.

user48838
  • 7,393
  • 2
  • 17
  • 14
  • @user48838 - RE: "Logging for of the applications for connections" Unless you mean, checking for example Apache's logs (which I tried, and got "/var/log/httpd/access_log: Permission denied") (is this normal for a SUDO user?) -- that said, I've had remote distributed logging setup for days. Though when I add ping after Greg's question, server was down until I removed ping. Now it's back to the existing pattern "downtime/uptime is irregular and are in durations of 5-20 minute periods." and ssh/http are down in parallel. – blunders Oct 10 '10 at 02:36
  • You should be able to open up the logs via a SUDO session/instance. Since the applications are unavailable in parallel, then going through their logs or utilization as well as anything that they may have in common (storage, common/shared libraries, etc.) is in order. Is anything like AWSTATS or NTOP available to assist? Those utilities along with firewall logs will help determine if this is something induced externally. – user48838 Oct 10 '10 at 16:48
0

Userland is fried. The kernel can respond to pings, but userland is wedged and so no application can receive data from sockets.

You're waiting for the length of time of a reboot (perhaps a watchdog is helping here?), so there is a short period of lack of ICMP ping unreachability in there.

What do the system logs show? /var/log/messages and friends?

Phil P
  • 3,040
  • 1
  • 15
  • 19
  • @Phil_P: After Greg asked about the ping, I added it to the remote test I've been running and after you posted I went to check. The logs and the systems been down ever since I started ping it, thought ping is still working. Removed the ping test, and ssh/http popped back up. So... tried to view "/var/log/messages: Permission denied" is it normal for read access to be block to SUDO level user, or would that require additional blocks? --- Also, what am I looking for in the "/var/log/messages" and what is userland and watchdog? THANK YOU!! – blunders Oct 10 '10 at 02:23
  • 2
    **Semi-Answer to "what is userland?":** http://en.wikipedia.org/wiki/User_space – blunders Oct 10 '10 at 03:46
  • 1
    **Possible-Answer to "what is watchdog?":** http://en.wikipedia.org/wiki/Watchdog_timer – blunders Oct 10 '10 at 03:49
  • RE: "so there is a short period of lack of ICMP ping unreachability in there" -- So, far since I've logged the ping there have been ZERO errors returned. Turn if off for a while, to see if the ssh/http would pop back up so I could try to look at the logs on the server (server came back up, any of the logs I tried to view were blocked)... – blunders Oct 10 '10 at 04:01
  • Just turned the pings back on just now to see if the server stays up or down based on pings, possible that the other downtimes are just the server getting pinged from someone else; no idea. If I'm able to keep the server down though, it's a good thing, since the admin that set up the server has said they don't see a problem, which is possible, but this should fix that. – blunders Oct 10 '10 at 04:02
  • It begins to sound as though there is a firewall in front of your box which is tracking flows, and if there are too many concurrent flows, new ones get denied when the state table fills. Thus the two flows from a constant ping is pushing you up over the limit, and you can't reach the host concurrently with another flow. One question is: if you ping from one IP and can't reach the host with ssh at the same time, can you reach it with ssh from another IP? In which case, you have a per-source-IP limit. Else, it's a really poor quality firewall & perhaps another customer is filling its state. – Phil P Oct 13 '10 at 21:38
0

Run top to check for memory, swapspace, and process counts. Do you have any swap. If not, add at least file based swap.

If your apache process counts continue to grow, then you may want to cut the process counts temporarily. Could be a denial of service attack on Apache.

You can use netstat to watch connections inbound and outbound. netstat -nt | wc -l should have a relatively stable count.

Try searching for denial of service, and memory leaks.

BillThor
  • 27,354
  • 3
  • 35
  • 69