I have an Ubuntu server connected to multiple VLAN networks over a single physical 1 Gbps network port. The network connections are configured via /etc/network/interfaces
like this:
auto lo
iface lo inet loopback
auto eno1.395
iface eno1.395 inet dhcp
vlan-raw-device eno1
auto eno1.453
iface eno1.453 inet static
address 10.1.2.3
netmask 255.255.255.0
vlan-raw-device eno1
auto eno2
iface eno2 inet static
address 192.168.1.2
netmask 255.255.0.0
That is, IP numbers are connected to eno1.395
(technically DHCP but public static IP in practice), eno1.453
(static IP) and eno2
(static IP). Interface eno1
doesn't have an IP number. Note that this is a pure server and will not route traffic between networks but it needs to communicate with other servers in multiple networks. This results in Ubuntu server applying following qdisc
config by default:
$ tc qdisc list
qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev eno1 root
qdisc fq_codel 0: dev eno1 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc mq 0: dev eno2 root
qdisc fq_codel 0: dev eno2 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc noqueue 0: dev eno1.395 root refcnt 2
qdisc noqueue 0: dev eno1.453 root refcnt 2
However, it seems that a long file transfer over eno1.395
network using cubic
congestion control algorithm causes high latency for bursty traffic in eno1.453
network. The bursty traffic needs to transfer 0.1–2 MB of data with random delays with low latency and requires maybe max 50 Mbps on average but short spikes might use 300 Mbps for 10–30 ms.
Is the fq_codel
on the eno1
device able to balance the traffic in VLAN networks eno1.395
and eno1.453
? If not, is there some way to configure fq_codel
so that it can balance traffic in both networks eno1.395
and eno1.453
at the same time (that is, drop packets in single hog connection over eno1.453
to reduce latency in eno1.395
)?
I see that e.g. eno1.395
is currently running qdisc noqueue
but if I add fq_codel
at that level, I'm pretty sure it can only balance traffic within that single VLAN.
I know that I could configure static limits for the VLANs using tbf
but that would prevent using the whole network connection nearly all the time and tbf
will introduce extra latency when traffic exceeds about 5 Mbps if I've understood correctly. I would much prefer using full bandwidth and simply balance the traffic with fq_codel
so that the single connection hogging all the bandwidth would be slowed down the most.
The RTT between all the problematic connections is about 0.25 ms. This allows cubic
to very effectively take all the bandwidth available for the single long running connection transferring lot of data. I guess that using e.g. vegas
congestion control algorithm could be one way to workaround this problem but I'm wondering if this is just incorrectly configured situation where fq_codel
could work perfectly well if configured well.
Update 1:
Original wording used expression "traffic shaping" but it turns out that this expression should be used only for soft limiting traffic to some predefined throughput. This question is stricly about allowing use of full bandwidth but avoiding latency for bursty traffic. As such, I only want fair queuing with Active Queue Management (AQM).
I'm currently thinking that the problem is actually caused by the bursts which may transmit data even faster than I expected because when I change to congestion control algorithm vegas
the induced latency goes away and the hog is limited to about 550 Mbps
speeds.
I still haven't been able to figure out how to measure the actual bandwidth required for the bursty traffic. The problem is that the bursts are so short (typically way shorter than 30 ms) and this is a production server which limits the type of expirements I can do. I'm pretty sure the bursty traffic happens in long running TCP/IP connections that idle often so they may have to enter the "slow start" to start sending again. I'm currently thinking that the idle period could be long enough to allow congestion control algorithms such as cubic
and even cdg
for the hog to take over the full bandwidth and I'm seeing the induced latency in burst throughput because it ends "slow start" too early because of collision with hog traffic. I'm pretty sure this is not bufferbloat but about different TCP/IP streams getting different balance than I'd like to have.
Open questions:
- How to measure the actual throughput of bursty connections without having to poll e.g.
/proc/net/netstat
continuously? I have about 450-500 TCP/IP mostly idle connections that have bursty traffic that I would want to serve with minimal latency. The hog seems to limit the effective throughput these connections can use immediately after the idle period. I think I would need to reserve the max expected spike all the time for these connections to avoid latency for time period different TCP/IP streams stabilize again. - How long can TCP/IP connection idle before it must start with slow start when it starts to retransmit? Is this adjustable? Is this affected by the hog traffic?
- Is it possible to disable
tcp_slow_start_after_idle
for given VLAN networks only? I think slow start after idle makes sense for connections going to the internet but I think disabling the slow start after idle for the VLAN 453 connections (which I know are always local and the network conditions are stable) would make a lot of sense. I see the basically same question has been asked in 2011 in LKML: https://lkml.iu.edu/hypermail/linux/kernel/1111.1/02240.html
Update 2:
I've been thinking this a bit more and I think I'm basically looking for equivalent of Linux cgroup
process scheduling for VLAN
networking. The cgroup
allows grouping processes and defining that a collection of processes as a whole can take 50% of the CPU when the whole system is bottlenecked by the CPU but the collection of processes can take up to 100% when the system has any idle remaining.
I'd want to allow either VLAN connection to take 100% of the physical connection if there's no competing traffic. But traffic in both VLAN networks should have each 50% when both try to move bits at the same time and the connection is full. And this redistribution of the physical connection should be instant or nearly instant!
Considering that the actually available physical connection can be detected only by monitoring RTT, ECN marking or packet loss, I'm not sure if the connection can be shared correctly without some kind of delay until the fair share is measured. I think even with Linux cgroup
balancing the minimum time delay for the balancing is 1/HZ
seconds. I guess that the logical equivalent for networks would be RTT or RTT multiplied by some constant.