How to ssh in the face of huge (25 second) latency?

5

I need to create an ssh connection between two Linux machines running Centos v5 but the latency could be as high as 25 seconds. I find that if I test something approaching this configuration artificially by simulating 7 seconds or more round trip latency using:

tc qdisc add dev eth0 root netem delay 7s

When I try:

ssh -n -o ConnectTimeout=0 WilliamKF@centos5Machine whoami

It fails after about 1 min 23 sec with:

Connection closed by 10.35.50.114

Note that ConnectTimeout=0 means never timeout. Also, simulating a round trip latency of 6 seconds results in a successful ssh after about 1 min 32 sec.

Is there anything I can do to get ssh to work in the face of extremely high latency on Linux? Why does ssh fail at this threshold? When I run tcpdump, I see nothing obviously wrong, there are about 51 packets, which packets of tcpdump are helpful here? Success took only around 41 packets.

WilliamKF

Posted 2010-11-19T16:11:27.927

Reputation: 6 916

2Is there a chance that correcting the 30 second latency is a possibility? That is beyond ridiculously high. What happens when you ping 10.35.50.114. What does trace route show? – Everett – 2010-11-19T16:26:22.577

i'm thinking a ridiculous amount of proxying is in place....i agree with reeeeeediculous proxies, so i am interested in responding...give me some time to look about and check source code. – RobotHumans – 2010-11-19T18:04:44.147

1it may require a recompiled ssh library...is this okay? – RobotHumans – 2010-11-19T18:05:03.990

@aking1012 The ssh is being invoked from inside a C++ application, so recompiling ssh and linking the alternate definition into the application is acceptable. – WilliamKF – 2010-11-20T05:29:20.110

@Everett I am unable to control the latency (it actually only reaches 25 seconds). In this case, I am only simulating the latency, the 10.35.50.114 is adjacent to the test machine and the connection is slowed down with the 'tc' command to simulate the latency. – WilliamKF – 2010-11-20T05:31:11.467

2Just so I understand. You have latency between your client, and some server while using SSH. You are simulating this latency bewteen your client, and another computer you have available to you? What I am saying is, latency that high would put your client, and the server you are trying to log into, approximately 60 times the circumference of the planet apart. That is a LOT (see UNGODLY) amount of latency. There is a problem that needs to be fixed between the two computers. Trying to program to correct for that large of a failure is really the wrong way to approach the problem. – Everett – 2010-11-20T05:36:19.000

1The reason is this: you are trying to allow for 25 to 30 seconds of latency for each part of the networking connect process. This doesn't just set it up so your application waits 25 to 30 seconds, you wait 25 to 30 seconds for transmission, and response, of EVERY packet. If ONE packet gets dropped in a connection based protocol, you wait for retransmit. You will NEVER see a successful connection with 30 second latency because you will likely overflow every buffer you have access to. – Everett – 2010-11-20T05:39:08.000

As I've answered your question (unless you feel I've missed something), maybe you could close it? – Everett – 2010-11-22T08:49:27.947

@Everett I was hoping to hear back from aking1012 who indicated they would look at the source code. I'm trying to succeed, not be told and accept that it can't be done. – WilliamKF – 2010-11-23T03:08:06.487

Answers

2

Short answer, you will never wait long enough with a 30 second latency per packet.

Everett

Posted 2010-11-19T16:11:27.927

Reputation: 5 425