Intel 10Gbps card bandwidth

0

I have an Intel 10Gbps card that has the 82599 10GbE Controller. The card has two ports in it. The datasheet of the controller says it supports PCIe 2.0 (2.5 GT/s or 5.0 GT/s)

Now, according to PCIe SIG's faq page (link: https://www.pcisig.com/news_room/faqs/pcie3.0_faq/#EQ3) says that for a 5.0 GT/s symbol rate PCIe gives an interconnect bandwidth of 4Gbps and a per lane per direction of 500MB/s)

I ran a netperf test on the card (I connected two of these cards via OFC back-to-back with no switches in-between) and the bandwidth of around 3.3Gbps (which is around 400MB/s)

Is my card under-utilized or does those numbers add up? Why wouldn't I get a full 10Gbps on the card (and only get 3.3Gbps)

(The card is x4 on an x8 slot)

Update: The network card goes to a slot that is configured as PCIe 3.0 and its an x8 slot (it supports upto 8.0 GT/s). And as to the board itself, well its a Freescale board (Processor: T4240). So I figured that board might be ok, with the card being slower of the two.

Thanks in advance.

Vigneshwaren

Posted 2014-09-08T15:05:09.320

Reputation: 245

What motherboard are you using and in which slots exactly are cards (including the network card) located? – Daniel B – 2014-09-08T15:17:47.170

You have this 10Gbps card connected to something. What is the specifications of that hardware? – Ramhound – 2014-09-08T15:29:35.810

Answers

3

There are many reasons why you may not be seeing 10Gbps across the link. I can offer the following:

  • PCIe 2.0 offers an effective bandwidth of 4Gbps per lane. A PCIe 2.0 4x card in a PCIe 2.0-or-better 8x slot will have a 4x link, providing 20Gbps of effective bandwidth. This is enough to handle both links being fully saturated assuming the rest of your hardware can handle it.
  • Many general-purpose desktop and server operating systems are not configured by default to handle high-bandwidth networking.

To get full performance out of that card, you'll want to:

  • Disable anything that will restrict performance of networking or CPU speed/interrupt processing:

Linux Example:

service irqbalance stop
service cpuspeed stop
chkconfig irqbalance off
chkconfig cpuspeed off
  • Enable 9K jumbo frames with a high transmit queue length:

Linux Example:

ifconfig eth2 mtu 9000 txqueuelen 1000 up
  • Increase the network buffers so that they can keep the card saturated with data:

Linux Example:

# -- 10gbe tuning from Intel ixgb driver README -- #

# turn off selective ACK and timestamps
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0

# memory allocation min/pressure/max.
# read buffer, write buffer, and buffer space
net.ipv4.tcp_rmem = 10000000 10000000 10000000
net.ipv4.tcp_wmem = 10000000 10000000 10000000
net.ipv4.tcp_mem = 10000000 10000000 10000000

net.core.rmem_max = 524287
net.core.wmem_max = 524287
net.core.rmem_default = 524287
net.core.wmem_default = 524287
net.core.optmem_max = 524287
net.core.netdev_max_backlog = 300000

There is further tuning you can do to the PCI link, such as bumping the maximum block size to 4K. Properly tuned, you should be able to push about 9.90Gbps across each link.

Keep in mind that server and client, and every hop along the way (switch/router) must be similarly tuned in order to not bottleneck the data flow.

Darth Android

Posted 2014-09-08T15:05:09.320

Reputation: 35 133

Well, I previously ran across this document https://www.kernel.org/doc/ols/2009/ols2009-pages-169-184.pdf which listed all these optimizations. I had performed them on my system. Well initially I maxed out at 2.5Gbps and I pushed it to 3.3Gbps playing around with those values exactly. I guess I would have to look at those numbers again.

– Vigneshwaren – 2014-09-08T16:05:41.097

@Vigneshwaren I assume you mean that you performed these optimizations on both of your systems? Or did you have two cards in the same system? – Darth Android – 2014-09-08T16:09:55.080

1

Nothing you mention above applies specifically to 10Gb ethernet. You didn't mention the most important reason for not getting full throughput and thats the 8b/10b (https://en.wikipedia.org/wiki/8b/10b_encoding) that knocks off 20% of your BW off the top.

– Astara – 2014-09-08T16:11:33.710

BTW..how do you do the 4k PCI block size tune? Never had luck w/that one (I use a 9k on the wire, but never had luck changing the pci BS)... – Astara – 2014-09-08T16:14:05.687

Oh no. To be very clear. Both the systems (I had two identical boards of the same CPU variety) had one of the 10Gig cards connected to each board (I had two Intel 10Gigs as well). The boards then were connected to each other via an optic fibre cable with absolutely no other device in between (switches, routers, modems, the Internet etc.) And I would like to know how you tuned PCI block size as well. Kernel config maybe? – Vigneshwaren – 2014-09-08T16:15:14.617

@Astara The PCI tuning is in the linked article, but is specific to PCI-X. If there is something similar for PCIe, I do not know it. – Darth Android – 2014-09-08T18:31:39.827

You said ("every hop along the way (switch/router) "). Starting w/the 10Gb speed, they don't support contention / collisions on the line -- so any interconnects must be switches (found that out today). Also, have seen article for PCI-X, but that bus is way old and you'd be hard pressed to run full speed w/1Gb. As for your figures above, are you using those w/10Gb? They look more like 1Gb figures. Maybe depends on memory, but on x64, I try to use larger buffs. – Astara – 2014-09-08T18:51:06.487

@Astara The numbers are from the linked article, in which the author managed to get 9.90Gbps over a 10Gbps link. I'm afraid most of my personal 10Gbps knowledge is theoretical due to the prices of 10Gbps hardware making it difficult to justify for home use (as much as I'd like to get some for my fileserver) – Darth Android – 2014-09-08T19:06:34.657

@DarthAndroid - re: $$ I know the feeling... I compromised. I ratcheted down my minimal setup to leave out a switch and only connect my home server & my desktop. Switches that supported teaming/bonding would have more than doubled my costs. But I could explore them using the dual cards (Intel x540t2). Unfortunately, I really can't get much if any benefit using 2 cards (insufficient CPU). Tried interrupt and process affinities among other things. But if you think of min as 2 cards, more affordable. – Astara – 2014-09-08T22:25:22.170

1

Same same here... turns out it is because the 10Gbps protocol revived the old modem encoding .. with a start/stop bit and 8 bits of data.

Today's rate:

R:512+0 records in
512+0 records out
4294967296 bytes (4.0GB) copied, 6.37415s, 642.6MB/s
W:512+0 records in
512+0 records out
4294967296 bytes (4.0GB) copied, 6.78951s, 603.3MB/s

(this is run on a Win7 client talking to null files on the linux end -- /dev/zero for reads, and /dev/null for writes).

For 'smb/cifs' and a single client, bonding 2 cards together doesn't help throughput (since smb/cifs is a 1 connection/client protocol). :-(

p.s.-This was not, BTW true on 1Gb and I don't think it is true on 40Gb... Lame! Feels like the diskspace MB!= 1024**2 Bytes issue when it first came out ... a way of making it sound better than it actually is...

Astara

Posted 2014-09-08T15:05:09.320

Reputation: 569

If you're seeing <5Gbps on a 10Gbps link, that's because either the server or the client is not properly tuned for 10Gbps, or because of Smb/Cifs protocol overhead. Raw network performance of a 10Gbps link should be very close to 10Gbps if you've set it up right. – Darth Android – 2014-09-08T15:53:19.540

Re: protocol overhead... never said otherwise. On 1gbit smb/cifs allows up to 125MdB writes (thats 125 million, not 1024^3), and 119MdB Reads. Note that the above are in MB/GB speeds using 1024 as a base. – Astara – 2014-09-08T15:59:18.427

I'm confused by your "P.S." statement, as regardless of how the data is encoded on the wire, 10Gbps is 10Gbps. The link layer (1Gbps, 10Gbps, 40Gbps) operates at the given speed, and any overhead due to encoding on the Ethernet or IP level (for headers and such) will apply to all of them. – Darth Android – 2014-09-08T16:12:10.940

From the link above on 8b/10b it lists technologies that this applies to. One that is not 8b/10b encoded is Gigabit Ethernet twisted pair–based 1000Base-T Gigabit Ethernet, which seems to be the most common. They also mention the PCI-E bus for speeds below 8Gt/s -- so maybe when you get to 40&100 you are on a faster bus? That last is a guess, but I remember in the same article telling me about 10G using 8b/10b, that 1000BT didn't and thought it also said 40/100 didn't. But can't find that article. The link shows most common Gb not using that encoding. – Astara – 2014-09-08T16:22:16.360

1Oh, I see the bad assumption I made. Twisted copper pair (1Gbps, 10Gbps, 40Gbps, +) does not use 8b/10b, but the fiber variants do. However, 10Gbps is still the effective speed ("These standards use 8b/10b encoding, which inflates the line rate by 25%") - so the physical fiber lines are carrying 12.5Gbps raw or 10Gbps effective. The math about PCIe 2.0 speeds at the top of my post does already include the overhead for 8b/10b encoding (hence the use of "effective bandwidth of 4Gbps per lane") – Darth Android – 2014-09-08T18:25:51.747