4

I have two HP DL380 G8 servers with 4x 1TB on HP p420 RAID controllers in RAID 1+0 setup. Eth0s are connected to the router, and Eth3&Eth4s are bonded (LACP) and connected directly between machines.

If I run

#!/bin/bash
clear

echo 'Starting disk speed analysis..'
echo -e '\n  Reading different size files (1M, 100M, 1G):\n \e[93m'

dd if=/dev/sda of=/dev/zero iflag=direct bs=1M count=1000 &> test-results.log
tail -1 test-results.log

dd if=/dev/sda of=/dev/zero iflag=direct bs=100M count=10 &> test-results.log
tail -1 test-results.log

dd if=/dev/sda of=/dev/zero iflag=direct bs=1G count=1 &> test-results.log
tail -1 test-results.log

echo -e '\n  \e[39mWriting different size files (1M, 100M, 1G):\n \e[93m'
dd if=/dev/zero of=/root/testfile oflag=direct bs=1M count=1000 &> test-results.log
tail -1 test-results.log

dd if=/dev/zero of=/root/testfile oflag=direct bs=100M count=10 &> test-results.log
tail -1 test-results.log

dd if=/dev/zero of=/root/testfile oflag=direct bs=1G count=1 &> test-results.log
tail -1 test-results.log

rm test-results.log
echo -e '\e[39m'

I get :

Reading different size files (1M, 100M, 1G):
1048576000 bytes (1.0 GB) copied, 2.81374 s, 373 MB/s
1048576000 bytes (1.0 GB) copied, 1.98058 s, 529 MB/s
1073741824 bytes (1.1 GB) copied, 1.88088 s, 571 MB/s

Writing different size files (1M, 100M, 1G):
1048576000 bytes (1.0 GB) copied, 0.871918 s, 1.2 GB/s
1048576000 bytes (1.0 GB) copied, 3.08039 s, 340 MB/s
1073741824 bytes (1.1 GB) copied, 3.2694 s, 328 MB/s

and

Reading different size files (1M, 100M, 1G):
1048576000 bytes (1.0 GB) copied, 2.80229 s, 374 MB/s
1048576000 bytes (1.0 GB) copied, 2.50451 s, 419 MB/s
1073741824 bytes (1.1 GB) copied, 2.136 s, 503 MB/s

Writing different size files (1M, 100M, 1G):
1048576000 bytes (1.0 GB) copied, 1.64036 s, 639 MB/s
1048576000 bytes (1.0 GB) copied, 3.48586 s, 301 MB/s
1073741824 bytes (1.1 GB) copied, 4.5464 s, 236 MB/s

And this seems to be fair speeds, but if I try to migrate VM of 100Gb size to another machine over the bonded network, I only get ~60MB/s network transfer speed and a short transfer of 120MB/s if that WM is running at the time of transfer.

Network vs storage speed of a single VM transfer

However, storage I/O rates can go quite high.. way above the network speed, o I presume storage speed is not a problem.. right?

I am using XCP-ng Center, connected over VPN. Its a fresh install, XCP-ng is v7.6.

Ideally I would expect around 2x125 MB/s transfer speed between the servers, any ideas why this is not happening?

Maybe anyone with a similar stack could share the experience? Thanks!

rjt
  • 568
  • 5
  • 25
Sakvojage
  • 43
  • 7

1 Answers1

4

Bonding wont help here as your source and destination IP addresses are fixed. LACP calculates a hash based on these IP addresses (and optionally also TCP port numbers) to determine which physical link to use. A single TCP session thus always puts packets from the same flow on the same physical link.

Gigabit ethernet interface use 8b/10b encoding so it can send at a maximum of 800Mbps, including layer 2, 3, and 4 overhead. Given the overhead of:

  • 18 bytes of Ethernet header
  • 20 bytes for an IP header; and
  • 20 bytes for a TCP header

That gives 58 bytes of overhead per 1518-byte layer-2 frame or about 4 percent. Subtracting this from the 800Mbps (=100MB/s) gives us a maximum of 96MB/s. This still excludes the interpacket gap which is 96ns for gigabit ethernet.

The speeds you're getting aren't thus as abnormal as you might expect.

Edit to answer rjt's questions:

LACP is a protocol which creates a virtual interface that bonds together one or more (up to sixteen of which maximum eight can be active) physical interfaces. This protocol can be implemented in servers, routers and switches. However there is no requirement that one end of the link must be a switch.

LACP can not use round robin. Below is a quote from CCIE Routing & Switching v5.0, the official certification guide for the CCIE exam, page 155 (emphasis mine):

This hashing function is deterministic, meaning that all frames in a single flow produce the same hash value, and are therefore forwarded over the same physical link. Hence, the increase in the available bandwidth is never experienced by a single flow; rather, multiple flows have a chance of being distributed over multiple links, achieving higher aggregated throughput. The fact that a single flow is carried by a single link and thus does not benefit from a bandwidth increase can be considered a disadvantage; however, this approach also prevents frames from being reordered. This property is crucial, as EtherChannel---being a transparent technology---must not introduce impairments that would not be seen on plain Ethernet.

Tommiie
  • 5,547
  • 2
  • 11
  • 45
  • The question includes "Eth3&Eth4s are bonded (LACP) and connected directly between machines". "Connected directly" means the ethernet cable goes directly from one NIC to the other NIC. No switch involved at all. How does one have LACP without a switch? – rjt Jan 02 '20 at 20:10
  • @Tommie, LACP can use a Round Robing algorithm to send frames, so the same path is not used. Does that help TCP over LACP? – rjt Jan 02 '20 at 20:37
  • Can you quote a source on the round robin for LACP? I've edited my answer in an attempt to answer both your questions. – Tommiie Jan 03 '20 at 05:59
  • Search these Linux man pages [bonding](https://www.kernel.org/doc/Documentation/networking/bonding.txt) or [teaming](https://github.com/jpirko/libteam/wiki/Tutorial) for round. I am simply making the jump since Linux supports and high end switches such as Arista are software defined using Linux, then they support it as well. In fact, the _default_ bonding algorithm mode is balance-rr or mode=0. However, the bonding manpage mentions round robin may not be in compliance with LACP out-of-order requirements. – rjt Jan 03 '20 at 21:53
  • @Sakvojage would increase performance and saturate more links by changing xmit_hash_policy to IP address or layer3 and use more IP addresses over the same bond. Alternatively, bond from groups of two cables each and spread over multiple IP addresses. – rjt Jan 03 '20 at 22:03