Proving packets per second bottleneck?

Question

I have a '100Mb' network connection that's currently consistently transmitting at about 20K packets/sec, irrespective of packet size in the range of 300-600 bytes. This yields an observed bandwidth of 25-98Mb. I'm constantly being told that because we've not hit the bandwidth limit, we don't have a line problem. I don't agree.

This connection is, on average, running at 60% of maximum the theoretical PPS rate for a 100Mb (copper ethernet) line, once packet size is accounted for. (Although the 100Mb bottle neck is fibre of unknown type, so may have different impact, I don't think that any fibre protocol is better than copper with interpacket gap).

My problem is - without access to the routers or fibre hardware (3rd party provided, can't be helped) how can I prove that we are packet limited? Ideally without causing a massive outage in the process :)

Run `mtr` over the link. Wait 30 minutes. Take a screen capture. — chicks, May 08 '16 at 03:43

score 1 · Accepted Answer · answered May 04 '16 at 09:53

Collect the traffic with tcpdump or a similar tool and make a graph of the packet count per time unit. If your assumption is correct, you should see a clear ceiling for the packet count.

You may simulate a counter example by generating many large packets with something like ping -s 1472 -f, it may cause a small outage, so maybe do not do it during the traffic peaks. But 30 seconds may be acceptable for solving a larger problem - you decide.

A switch can be easily the bottleneck as well. Especially cheaper one or a black box router. This was the most common case for a WAN network I was working on. The minimum standard for this kind of traffic was a HP ProCurve line thing. Even an old Cisco was fine as well. But you have to test it.

Also good to mention that among ISPs we in generally used a rule of thumb that 60% utilized line was a fully saturated line. The reason why is that the saturation is basically average over some time of period. But on a shorter time frame you may have overloads by attempting to send just too many packets in the exact same moment which will lead to longer latency. Measure the latency as well. Wireshark is a good tool for a quick analysis like this.

Last but not least, I have not seen any kind of traffic which can fully saturate the line but the ping -s 1472 -f on an otherwise empty line. Once you have multiple connections, you have inefficiencies which lead to lower utilization. Basically, 100Mbit is a theoretical limit under ideal conditions. So the line provider may be right as well and upgrading the line may be the proper solution.

Probably better to ping with the average size of the traffic at the time? If I ping with large packets, I change the observed problem. On the other hand, if I flood with packets of the same size and the PPS count doesn't increase, it clearly shows the problem. Or I food with minimal size packets - if network not currently maxed out, it shouldn't be a problem. If it is, should reduce average packet size and throughput. — user2702772, May 04 '16 at 10:04
Well, you are partly right, as there is one more issue and it is the difference between one connection of evenly sized packets and multiple connections from different sources of variously sized packets. The second example will perform usually worse. We did no science, so we just accepted it and acted accordingly. But if your scenario will show the problem as well, than ,obviously, the system will not perform better under worse condition. Only when you will find no problem, it does not mean that you can completely rule out this case. Anyway, as I think about it, your line looks just fine to me. — Petr Chloupek, May 04 '16 at 11:32

Proving packets per second bottleneck?

1 Answers1