MTR Report Analysis

Question

We have two different routes to a server.

One has this latency of 20000+ ms latency. The other does not have it.
The latency happens exactly in the same route and in the same node.
The users complains about internet performance when they are reaching the server through this route.

My question, is this latency is actually causing this issue or do we have to look in to other factors.

Note: We have a a good bandwidth monitoring system and we know if there is any bandwidth abuse from any of our workstation.

Day 1 Route

Start: Sun Oct  8 13:52:18 2017
HOST: gw131                       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- gateway                    3.3%    30    0.2   0.3   0.2   0.9   0.0
  2.|-- 172.16.65.97               0.0%    30    0.5   0.5   0.4   1.2   0.0
  3.|-- 202.53.163.113             0.0%    30    0.4   0.5   0.4   1.0   0.0
  4.|-- 103.12.172.217             0.0%    30    1.1   0.8   0.5   5.9   1.0
  5.|-- 103.12.172.237             0.0%    30    0.8   0.8   0.5   4.6   0.7
  6.|-- ix-ge-2-0-1-0.tcore3.MLV-  3.3%    30   87.4  87.6  87.3  90.6   0.6
  7.|-- if-ae-4-2.tcore1.MLV-Mumb  0.0%    30  185.0 185.7 184.9 190.6   1.2
  8.|-- if-ae-9-5.tcore1.WYN-Mars  0.0%    30  204.3 203.9 203.6 205.8   0.5
  9.|-- if-ae-8-1600.tcore1.PYE-P  0.0%    30  184.7 185.8 184.5 204.9   3.8
 10.|-- if-ae-11-2.tcore1.PVU-Par  0.0%    30  203.2 203.1 202.8 206.4   0.6
 11.|-- 80.231.153.66              0.0%    30  185.2 185.9 185.2 196.7   2.1
 12.|-- ae-2-3601.ear2.Washington 70.0%    30  24311 23996 23078 24311 412.9
 13.|-- SUNGARD-NET.ear2.Washingt  0.0%    30  263.0 262.4 260.8 293.2   5.8
 14.|-- phl3cr1-te-0-0-1-2.sgns.n  0.0%    30  282.5 282.5 282.1 285.6   0.6
 15.|-- smy1cr1-te-0-0-0-2.sgns.n  3.3%    30  277.8 278.3 277.6 286.4   1.5
 16.|-- dal2cr1-te-0-1-0-2.sgns.n  6.7%    30  281.5 281.8 281.5 284.7   0.5
 17.|-- 66.179.229.126             0.0%    30  284.7 284.7 284.2 291.5   1.3
 18.|-- ???                       100.0    30    0.0   0.0   0.0   0.0   0.0

Day 2 Route:

Start: Mon Oct  9 21:32:07 2017
HOST: gw131                       Loss% Javg  Last   Avg  Best  Wrst StDev
  1.|-- gateway                    0.0%  0.7   0.2   0.6   0.1   4.9   1.0
  2.|-- 172.16.65.97               0.0%  1.3   0.5   1.1   0.3   9.4   1.8
  3.|-- 202.53.163.113             0.0%  3.1   0.8   2.1   0.4  37.1   6.7
  4.|-- 103.12.172.217             0.0%  1.7   0.5   1.5   0.5   6.1   1.7
  5.|-- 103.12.172.249             0.0%  1.7   0.6   1.7   0.5   8.1   1.9
  6.|-- 103-16-155-89-noc.bsccl.c  0.0%  1.4   1.1   1.8   1.0   9.1   1.8
  7.|-- 103-16-152-30-noc.bsccl.c  0.0%  1.7   1.2   2.2   1.0  10.3   2.1
  8.|-- 103-16-152-34-noc.bsccl.c  0.0%  1.5   5.9   6.5   5.5  14.5   1.8
  9.|-- 116.51.31.233              0.0%  0.9  58.0  58.8  58.0  60.9   0.5
 10.|-- ae-17.a00.sngpsi05.sg.bb.  0.0%  2.3  58.1  59.6  57.7  66.8   2.3
 11.|-- ae-0.level3.sngpsi05.sg.b 16.7% 3391 7458. 3324.  73.2 7738. 3008.0
 12.|-- ae-2-3601.ear2.Washington 63.3% 271. 23772 23481 22938 24086 373.8
 13.|-- SUNGARD-NET.ear2.Washingt  0.0% 16.4 309.7 304.5 288.8 353.5  16.1
 14.|-- phl3cr1-te-0-0-1-2.sgns.n  0.0%  9.2 310.4 324.5 310.4 340.9  10.7
 15.|-- smy1cr1-te-0-0-0-2.sgns.n  0.0% 11.1 328.1 317.0 305.9 335.9   9.6
 16.|-- dal2cr1-te-0-1-0-2.sgns.n  0.0%  9.4 310.3 321.1 310.2 334.2   9.2
 17.|-- 66.179.229.126             0.0% 10.0 312.3 323.6 312.3 342.3  10.1
 18.|-- 95-216.205.157.appsitehos  0.0% 15.2 335.8 342.6 325.9 391.6  17.3

Day 3 Route:

gw131 (0.0.0.0)                                                                           Tue Oct 10 14:36:21 2017
Resolver: Received error response 2. (server failure)er of fields   quit
                                                                            Packets               Pings
 Host                                                                     Loss% Javg  Last   Avg  Best  Wrst StDev
 1. 202.53.167.129                                                         0.0%  0.4   0.2   0.4   0.1   6.8   0.7
 2. 172.16.65.97                                                           0.0%  0.5   0.5   0.7   0.4   5.4   0.7
 3. 202.53.163.113                                                         0.0%  2.0   2.1   1.9   0.4  22.6   4.2
 4. 103.12.172.217                                                         0.0%  0.7   0.6   0.9   0.5   9.1   1.2
 5. 103.12.172.249                                                         0.0%  0.6   0.7   1.0   0.5   7.7   0.8
 6. 103-16-155-89-noc.bsccl.com                                            0.0%  1.0   1.3   1.7   1.0   9.3   1.3
 7. 103-16-152-30-noc.bsccl.com                                            0.0% 23.1   1.5  12.9   1.1 1002. 105.5
 8. 103-16-152-34-noc.bsccl.com                                            0.0%  0.7   5.8   6.1   5.5  13.6   1.1
 9. 116.51.31.233                                                          0.0% 23.3  58.4  70.0  57.9 1063. 105.9
10. ae-17.a00.sngpsi05.sg.bb.gin.ntt.net                                   0.0%  2.4  58.0  59.4  57.6  77.7   3.2
11. ae-0.level3.sngpsi05.sg.bb.gin.ntt.net                                15.6% 4179 7084. 3128.  69.5 7247. 2939.
12. ae-2-3601.ear2.Washington1.Level3.net                                 26.7% 748. 23577 23876 17859 27371 1073.
    103-16-155-89-noc.bsccl.com
13. SUNGARD-NET.ear2.Washington1.Level3.net                                0.0%  8.1 312.6 305.7 288.8 335.4  10.6
14. phl3cr1-te-0-0-1-2.sgns.net                                            0.0%  8.8 311.1 327.1 309.8 344.1   9.1
15. smy1cr1-te-0-0-0-2.sgns.net                                            0.0%  8.0 327.6 322.1 305.3 344.3   9.7
16. dal2cr1-te-0-1-0-2.sgns.net                                            0.0%  8.2 332.2 328.0 309.4 356.0   9.5
17. 66.179.229.126                                                         0.0% 10.7 334.8 329.5 311.9 359.7  10.2
18. 95-216.205.157.appsitehosting.com                                      0.0% 38.0 331.8 352.5 311.3 1329. 106.2

What tool and flags you used to get this pretty traceroute? – HSchmale Oct 08 '17 at 20:07 — HSchmale, Oct 08 '17 at 20:07
@HSchmale It looks similar to the output from `mtr`. – kasperd Oct 09 '17 at 19:42 — kasperd, Oct 09 '17 at 19:42

MadHatter · Accepted Answer · 2017-10-08T09:02:06.017

9

20 seconds latency on any network would be a performance disaster for any kind of interactive work, if you had it. But you don't.

You experience high latency (and packet losses) in getting traceroute responses from one hop on your journey to a remote host. This is not uncommon, especially if you're using ICMP-based traceroute, because most network devices prioritise actually routing traffic over sending ICMP ttl-exceededs about random PINGs which have died of old age. As you can see from the hops on the far side of that host (13-17), there are no such delays in your traffic passing through the host. Your biggest single hop-to-hop delay is between hops 6 and 7, which seems to be a point-to-point link inside your ISP, one that's probably saturated. You might consider monitoring that and complaining to your ISP if it remains so for some time (no ISP is going to respond to the complaint that you ran one traceroute and saw link problems, and rightly not).

As for what's causing problems with "internet performance", that is such an unquantified issue that it is impossible to speculate. If you can get a clearer problem statement from the user it may be possible to design experiments to shed light on it.

As an aside, please don't post images of text output; the links rot and the evidence is lost to your question, and they are unsearchable as text would not be. I've put the image into your question (something you lack the rep to do, I suspect, which isn't your fault) - but best practice is to cut-and-paste text into your question instead.

edited Oct 08 '17 at 09:02

answered Oct 08 '17 at 08:56

MadHatter

78,442
20
178
229

1

Overall a good answer. I have a couple of comments though. **1.** The difference in performance between forwarding packets and generating ICMP errors might not be prioritization but rather that "real" routers can do packet forwarding in hardware but need to use the CPU for anything more complicated such as generating ICMP responses. **2.** Latency of 284ms may simply be due to the speed of light, which the ISP cannot change. **3.** The last paragraph would have been better suited as a comment rather than as part of the answer. – kasperd Oct 08 '17 at 09:51
@kasperd **1**. I completely agree, good point. **2** not from Dhaka to Mumbai, or indeed anywhere much else; 258ms is more than once round the planet, with fibre of refractive index n=1.5. **3** I wrote it as a comment first, but since I was posting an answer, thought it best only to write one thing. Feel free to edit it out and post it as a comment yourself! – MadHatter Oct 08 '17 at 13:06
The output from traceroute will show roundtrip times not the time it took in just one direction, and the speed of light in fiber is slower than in vacuum. That considered every 100km cost about 1ms. So the 285ms works out to about 28500km. Not all the way around the globe but more than half way. So a direct path wouldn't take that long, but it is not always practical to get fibers to take the absolute shortest path. You can't know for sure which hop adds the latency. For example if MPLS is used it can make latency appear to be at an earlier hop than it really is. – kasperd Oct 08 '17 at 13:28
I know light goes slower in glass, which is why I quoted a refractive index in my calculation; your point about round-trips, however, is a valid one. It seems, that given, that we're agreeing that adding 285ms with speed of light alone in a single hop is going to be quite unlikely - and since I strongly suspect the hop is from Dhaka to Mumbai, the fibre would have to be *very* badly laid indeed! Perhaps this is not the best place for this discussion? – MadHatter Oct 08 '17 at 14:38
The traffic suspiciously appears to go through something called `ear2.Washington`. – Oct 08 '17 at 14:45
Thank you MadHatter and kasperd for your insights. Thank you for noticing the latency increase from 6 to 7. We can definitely talk to the ISPs about it. As for the slowness, there is a database in the destination node, where users experience slow form submission time when the route is like this. I have tried to include the text but the format is lost somehow and the entire thing becomes garbage. – Dewan Shamsul Alam Oct 08 '17 at 16:05
@DewanShamsulAlam Just select the text and click the code formatting icon. – kasperd Oct 08 '17 at 18:20
@MadHatter You can't be certain the latency is actually between hop 6 and 7. Notice how from hop 7 to 11 the measured roundtrip remains mostly constant. In some network setups there are devices which don't have enough logic to route packets directly back to the source, so the ICMP error may have to be routed forward until it reaches a device that knows how to route it the other direction. For example that can be the case with MPLS. So it may be that you are actually seeing the roundtrip to hop 11 and back repeated 5 times. – kasperd Oct 08 '17 at 18:38
@kasperd I did indeed notice that, and I think you misunderstand my assertion, which is that the apparent *single greatest contributor* to the latency is hop 6-7. You are right that we don't know it for sure, but we have reasonable grounds to suspect it, and given its probable geographical distance, it shouldn't be such a large contributor. Personally, I think there are grounds to keep an eye on it, and to complain to the ISP if it keeps being such a step. You disagree with such a strategy? – MadHatter Oct 08 '17 at 19:09
@MadHatter The apparent latency of each hop from a traceroute will be misleading. Just look at the following four hops which in total add minus 0.2 milliseconds. Whether it is worth bringing up the latency with the ISP depends on how far the two endpoints are physically apart. If they are on opposite sites of the world then it is pointless trying to get the ISP to improve the latency because it can't get much better than the 284ms. Instead the OP should try to get a server closer to the users. If OTOH the physical distance is less than 1000km then for sure complain to the ISP. – kasperd Oct 08 '17 at 22:17
The absolute figures I agree are both variable not very reliable. But in my experience the *relative* figures are worthy of closer inspection, which is what I advocate. You give no justification for your 1000km rule of thumb, but since (as I keep pointing out) it is likely that the distance between the two points is less than 2000km, it still seems unlikely to me that speed-of-light issues are responsible. If you have anything further to add I'd appreciate it if you'd write your own answer, as I'm getting slightly fed up dealing with apparently endless *minutiae* in a comments thread. – MadHatter Oct 09 '17 at 06:03
@kasperd I don't think hop 6 to 7 has anything to do here. Here is another example through a slightly different path but through the same router. – Dewan Shamsul Alam Oct 09 '17 at 15:53
@DewanShamsulAlam If those two routes are between the same pair of hosts, then it may be that the packets are sometimes routed through a suboptimal path. That is something it may be worthwhile asking the ISP about. If we knew which city each end of the connection was in and if you included the full hostnames such that we could better guess where the intermediate routers are located, then we could better asses whether the problem is load on the routers or suboptimal routing. – kasperd Oct 09 '17 at 19:47
@kasperd, I have added another report with full hostname. I think you will be able to extract the location of the routers using whois now. – Dewan Shamsul Alam Oct 10 '17 at 08:41
Please **do not try to ask and have answered a new question in the comments field of an old, accepted answer**. @DewanShamsulAlam, you asked about the 20+s latency in a single traceroute output, and whether it might be responsible for a slowdown you observe. I've answered that, apparently to your satisfaction. If you want to ask a new question (perhaps "*why do my users complain about internet performance*", though I'd vote to close that if it was not *thoroughly* quantified), please **ask a new question**. – MadHatter Oct 10 '17 at 09:25

MTR Report Analysis

1 Answers1