1

I have an Ubuntu server connected to multiple VLAN networks over a single physical 1 Gbps network port. The network connections are configured via /etc/network/interfaces like this:

auto lo
iface lo inet loopback

auto eno1.395
iface eno1.395 inet dhcp
vlan-raw-device eno1

auto eno1.453
iface eno1.453 inet static
address 10.1.2.3
netmask 255.255.255.0
vlan-raw-device eno1

auto eno2
iface eno2 inet static
address 192.168.1.2
netmask 255.255.0.0

That is, IP numbers are connected to eno1.395 (technically DHCP but public static IP in practice), eno1.453 (static IP) and eno2 (static IP). Interface eno1 doesn't have an IP number. Note that this is a pure server and will not route traffic between networks but it needs to communicate with other servers in multiple networks. This results in Ubuntu server applying following qdisc config by default:

$ tc qdisc list
qdisc noqueue 0: dev lo root refcnt 2 
qdisc mq 0: dev eno1 root 
qdisc fq_codel 0: dev eno1 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
qdisc mq 0: dev eno2 root 
qdisc fq_codel 0: dev eno2 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
qdisc noqueue 0: dev eno1.395 root refcnt 2 
qdisc noqueue 0: dev eno1.453 root refcnt 2 

However, it seems that a long file transfer over eno1.395 network using cubic congestion control algorithm causes high latency for bursty traffic in eno1.453 network. The bursty traffic needs to transfer 0.1–2 MB of data with random delays with low latency and requires maybe max 50 Mbps on average but short spikes might use 300 Mbps for 10–30 ms.

Is the fq_codel on the eno1 device able to balance the traffic in VLAN networks eno1.395 and eno1.453? If not, is there some way to configure fq_codel so that it can balance traffic in both networks eno1.395 and eno1.453 at the same time (that is, drop packets in single hog connection over eno1.453 to reduce latency in eno1.395)?

I see that e.g. eno1.395 is currently running qdisc noqueue but if I add fq_codel at that level, I'm pretty sure it can only balance traffic within that single VLAN.

I know that I could configure static limits for the VLANs using tbf but that would prevent using the whole network connection nearly all the time and tbf will introduce extra latency when traffic exceeds about 5 Mbps if I've understood correctly. I would much prefer using full bandwidth and simply balance the traffic with fq_codel so that the single connection hogging all the bandwidth would be slowed down the most.

The RTT between all the problematic connections is about 0.25 ms. This allows cubic to very effectively take all the bandwidth available for the single long running connection transferring lot of data. I guess that using e.g. vegas congestion control algorithm could be one way to workaround this problem but I'm wondering if this is just incorrectly configured situation where fq_codel could work perfectly well if configured well.

Update 1:

Original wording used expression "traffic shaping" but it turns out that this expression should be used only for soft limiting traffic to some predefined throughput. This question is stricly about allowing use of full bandwidth but avoiding latency for bursty traffic. As such, I only want fair queuing with Active Queue Management (AQM).

I'm currently thinking that the problem is actually caused by the bursts which may transmit data even faster than I expected because when I change to congestion control algorithm vegas the induced latency goes away and the hog is limited to about 550 Mbps speeds.

I still haven't been able to figure out how to measure the actual bandwidth required for the bursty traffic. The problem is that the bursts are so short (typically way shorter than 30 ms) and this is a production server which limits the type of expirements I can do. I'm pretty sure the bursty traffic happens in long running TCP/IP connections that idle often so they may have to enter the "slow start" to start sending again. I'm currently thinking that the idle period could be long enough to allow congestion control algorithms such as cubic and even cdg for the hog to take over the full bandwidth and I'm seeing the induced latency in burst throughput because it ends "slow start" too early because of collision with hog traffic. I'm pretty sure this is not bufferbloat but about different TCP/IP streams getting different balance than I'd like to have.

Open questions:

  1. How to measure the actual throughput of bursty connections without having to poll e.g. /proc/net/netstat continuously? I have about 450-500 TCP/IP mostly idle connections that have bursty traffic that I would want to serve with minimal latency. The hog seems to limit the effective throughput these connections can use immediately after the idle period. I think I would need to reserve the max expected spike all the time for these connections to avoid latency for time period different TCP/IP streams stabilize again.
  2. How long can TCP/IP connection idle before it must start with slow start when it starts to retransmit? Is this adjustable? Is this affected by the hog traffic?
  3. Is it possible to disable tcp_slow_start_after_idle for given VLAN networks only? I think slow start after idle makes sense for connections going to the internet but I think disabling the slow start after idle for the VLAN 453 connections (which I know are always local and the network conditions are stable) would make a lot of sense. I see the basically same question has been asked in 2011 in LKML: https://lkml.iu.edu/hypermail/linux/kernel/1111.1/02240.html

Update 2:

I've been thinking this a bit more and I think I'm basically looking for equivalent of Linux cgroup process scheduling for VLAN networking. The cgroup allows grouping processes and defining that a collection of processes as a whole can take 50% of the CPU when the whole system is bottlenecked by the CPU but the collection of processes can take up to 100% when the system has any idle remaining.

I'd want to allow either VLAN connection to take 100% of the physical connection if there's no competing traffic. But traffic in both VLAN networks should have each 50% when both try to move bits at the same time and the connection is full. And this redistribution of the physical connection should be instant or nearly instant!

Considering that the actually available physical connection can be detected only by monitoring RTT, ECN marking or packet loss, I'm not sure if the connection can be shared correctly without some kind of delay until the fair share is measured. I think even with Linux cgroup balancing the minimum time delay for the balancing is 1/HZ seconds. I guess that the logical equivalent for networks would be RTT or RTT multiplied by some constant.

Mikko Rantalainen
  • 858
  • 12
  • 27

1 Answers1

4
  1. fq_codel is not a shaper, but a qdisc that does fair queuing and aqm. tbf and htb are shapers, as is cake in bandwidth mode.

  2. Up until this very moment I thought fq_codel peeled off the vlan headers, and would automatically balance the flows across vlans (and I'm one of the authors!). I suspect the real culprit is your switch or some other bottleneck along this path that is starving the flows you care about.

Simplest suggestion: Try:

tc qdisc replace dev eno1 root cake bandwidth 900Mbit # if you have cake

or tbf + fq_codel

to try and shift the bottleneck to your machine.

  1. simplest diagnostic - hit your first vlan up with a long running flow, simultaneous using mtr to measure at what hop your second vlan is acting up. If you have any hops to measure. flent's test suite is helpful for this.
Dave Taht
  • 41
  • 2
  • 1
    This is the first time I've heard about the cake qdisc and its looking like my new go-to (I used to use hfsc). +1 for learning something new every day.. – Matthew Ife Jun 30 '22 at 08:49
  • I was able to workaround the issue simply by changing to congestion control algorithm `vegas` for that system. The single hog connection now seems to take about 550 Mbps without inducing any extra latency to the traffic in another VLAN. I'm still wondering how to track how short but high spikes the bursty traffic actually causes. If `vegas` is able to deduce correct limit, maybe the bursty traffic actually requires 450 Mbps to avoid induced latency? Maybe I was optimistic with my assumption 300 Mbps? – Mikko Rantalainen Jul 01 '22 at 07:44
  • And thanks for the note about correct nomenclature. It seems that I was thinking that "shaping" would also include `fq_codel` because it keeps connections in flows and "shapes" the traffic so that throughput is more fair for each flow (that is, prevent single hog flow from taking more than its fair share). I'll fix the wording in the question. – Mikko Rantalainen Jul 01 '22 at 07:51
  • Does `flent` support a test where one TCP/IP connection would hog all the bandwidth and a couple of open but idling TCP connections send multipacket bursts after random delays? I think I'm basically trying to optimize the system so that any connection that has been idling a lot should be granted more than fair share of the the full bandwidth when they restart transmitting. Maybe the situation is so dynamic because of the bursty traffic that `vegas` cc algorithm is the best I can have? I'd prefer using `vegas` for everything over manually setting soft limits or QoS flags for specific flows. – Mikko Rantalainen Jul 01 '22 at 08:28
  • I think the actual issue I'm seeing is falling back to TCP slow start between the bursts and slow start scaling ending too early because of the hog. I added some updated sections at the end of the question. So the induced latency is probably not about losing packets but uneven distribution of packets between the flows where I would want bigger share for the bursty traffic. – Mikko Rantalainen Jul 01 '22 at 10:14