Does PCIe 3.0 x8 provide enough bandwidth for a dual QSFP 40Gbit NIC?

6

1

I'm researching dual QSFP 40GBit network cards for a work project and have a few questions regarding PCIe 3.0's theoretical max bandwidth.

I'm currently looking at a dual QSFP PCIe 3.0 x8 card on CDW (Mellanox MCX314A-BCBT), but I don't think PCIe x8 would provide enough bandwidth for both 40Gbit links at 100% utilization.

Wikipedia states that PCIe 3.0 has a theoretical max bandwidth of 985MB/s per lane. Thus, by my calculations, PCIe 3.0 x8 would yield a max bandwidth of 7880MB/s. If this is true, the card would saturate the PCIe connection if both links are at 100% utilization.

Here are my specific questions:

  • What is the max bandwidth that a dual QSFP 40Gbit network card can output?

  • What is the max bandwidth that a PCIe 8x can handle before saturation?

  • Is there an easy way to calculate this?

SPARCpenguin

Posted 2013-04-23T22:20:04.227

Reputation: 63

3There's actually a little more to it than you realise. You need fast memory and fast CPU as this also becomes a bottleneck. Particularly the memory as memory copy operations become the bottleneck at these speeds. That is why RDMA has become so important and is now a standard feature of 10Gbit+ NICs. – Matt H – 2013-04-23T22:42:27.013

In our app we're using FDR infiniband and unfortunately spec'd the CPU toward the low end. This limited the memory speed and we hadn't anticipated the knock on effect that this would have on IPoIB performance. – Matt H – 2013-04-23T22:44:14.723

1Yeah, that was the next thing on my plate... We'll definitely have to upgrade the memory from dual to triple channel ddr3-1333, but we might have to go to 1600. – SPARCpenguin – 2013-04-23T22:49:41.953

2BTW Welcome to Super User! – James Mertz – 2013-04-23T22:54:52.087

Answers

5

Doing a little dimensional analysis

Converting 7880 Mbytes per second to Gbits per second we get 63.04 Gbits/sec

(63.04 gigabits per second)/(40 (gigabits per second)) = 1.576

If you have a layout like this:

Unit (1) PCIe 3.0 slot, x8 or larger -> one QSFP card providing 1 x 40 Gbps connected to 8 lanes

Unit (2) PCIe 3.0 slot, x8 or larger -> one QSFP card providing 1 x 40 Gbps connected to 8 lanes

... then it will work fine, even accounting for the fact that there will be some overhead making it difficult to achieve theoretical throughput, because each PCIe slot will get its own lanes.

If however you have a layout like this:

Unit (1) PCIe 3.0 slot, x8 or larger -> one QSFP card providing (2 x 40 Gbps) connected to 8 lanes

... then it won't work fine, because now instead of having 1.576 times more bandwidth you need, you actually only have 0.788x as much bandwidth as you need.

I guess the manufacturer figures that protocol overhead at the ethernet layer will slow it down enough that this limitation won't matter, or otherwise some bottleneck further down in the I/O subsystem, or app turns (round trips). It seems odd that they would design the card so that the maximum theoretical throughput of the two ports on the board exceeds the theoretical throughput of the 8 lanes, but if you are really expecting to utilize >78% of 80 Gbps of throughput, you might want to just buy two cards (ideally, with one port apiece, if you can find a different model with that), and put them in separate slots, with each slot being at least 8 lanes wide.

allquixotic

Posted 2013-04-23T22:20:04.227

Reputation: 32 256

Your calculation doesn't exactly make sense. You end up with a ratio of MBytes/Gbits – James Mertz – 2013-04-23T22:40:30.203

I was thinking the same thing... The box I'm talking to only supports UDP, so I won't have the overhead of TCP. Unfortunately, that also means packet loss could be a real problem... As for the dual card setup, I'm running this in a 1U box and am limited to one PCIe x16 slot. I don't know if we're going to be running both links at 100% utilization, so we might be able to get away with this card. But I'm going to push to get a pcie x16 version. – SPARCpenguin – 2013-04-23T22:43:30.703

@KronoS Yes it does. I wrote it poorly, but forgot that Wolfram Alpha implicitly converts the units so that it's comparing apples to apples prior to doing division. Check out that link :)

– allquixotic – 2013-04-23T22:47:56.633

1@SPARCpenguin As long as you don't use Jumbo Frames at the IP level, I bet you'll have a tremendous amount of packet overhead and you'll never taste anything near 80 Gbps total throughput from the two ports combined. If you do use Jumbo Frames, well, it might actually work, as then 99.9999999999999% of the data going over the PHY is your actual UDP payload.... so if you could actually generate payload at that rate, it could use it... hmm... Yeah, look for x16 if you can get it. – allquixotic – 2013-04-23T22:49:57.310

ok I fixed it... cuz I'm OCD like that :) – James Mertz – 2013-04-23T22:51:46.870

1If you do enable Jumbo frames, make sure that all devices expect the same size. One device using (only) 9000 byte frames and one using only a smaller value can result in fun. – Hennes – 2013-04-23T23:06:54.363

7880 MB/s = 66.1 Gpbs / 40 Gbps = 1.65, not 1.576 – psusi – 2013-04-23T23:12:16.177

7

What is the max bandwidth that a PCIe 8x can handle before saturation?

The max bandwidth of a single PCIe v3 lane is 985 MB/sec. (8.0 Gbit/second).

x8 means that up to 8 PCIe lanes can be used, which give a theoretical max of 64 Gbit/sec.
This is less then two 40Gbit links.

So you can not run both links at full speed. It might be enough in practise though. Especially if the traffic is bursty. Just as long as both channels do not burst at the same time.

Hennes

Posted 2013-04-23T22:20:04.227

Reputation: 60 739