I have a server that has around 100 SSH tunnel connections active from client servers across Canada and the US. We use the same device that runs a custom build of Ubuntu and load that on each client server that connects to the server. Recently, I have attempted to setup some of these client servers and I am receiving a connection timeout when attempting to connect to the main server from those client servers.
Here are some of the important debug steps I have taken and their results:
- The client server is receiving a timeout when attempting to connect to the main server even though it can ping the server.
- When trying to telnet into port 22, the connection times out instead of receiving the SSH acknowledgement
- I can SSH into any other machine from that client server except the main server
- Other machines can SSH into the main server, even on the same IP address as the client servers
- Each client server has the exact same OS build as the other client servers
- There are around 100 active connections from other client servers currently deployed using the same configuration, but only these new ones are experiencing the problem
- I have increase the maximum number of SSH connection attempts (MaxStartups) as well as the maximum number of TCP socket connections (net.core.somaxconn) to 2000 and 65535, respectively, and this has not improved the situation
I am stuck and need to figure out why this is happening. Any help would be appreciated. Thanks!