0

I identified that data transfer within my cluster will be bottlenecked by the network interconnect (I'm saturating the dual SAS connection to my storage with sequential workloads), and I'm undecided between a 10 GbE or 40/56 Gb Infiniband to mitigate the problem.

I'm leaning towards using dual port 10 GbE NICs and link aggregation to increase the throughput between my servers. However, I've read that the throughput doesn't increase linearly with the number of links. What kind of throughput should I expect? If it depends on my working set, how do I go about estimating the throughput?

elleciel
  • 379
  • 4
  • 11
  • I guess any nonhandwaiving answer would require more details of your setup to build upon. Can you give a more detailed characterization of your sequential workload, is it only long sequential reads? How is the storage configured? – Dmitri Chubarov Mar 11 '14 at 15:57

1 Answers1

1

Bonding 1GBE links is almost linear. I haven't tried it with 10GBe links though and I suspect it's less than linear. The reason is the memory and CPU become bottlenecks. I've seen this with FDR infiniband and I don't doubt the same can happen to Ethernet despite the offload mechanisms.

This is why RDMA was invented. On Infiniband it's referred to as just RDMA. On ethernet they call it RoCE or RDMA over converged ethernet. It's designed to provide memory to memory transfer and bypass the CPU for all but setting up the transfer process. The iSCSI iSER protocol uses RDMA. However, again there is a bottleneck. To do RDMA you need to lock memory regions to prevent them being paged to disk during the process. This takes time and is often a bottleneck. On FDR, we were getting around 6GBytes/sec throughput which is still shy of 56GBit/s - overheads.

Generally people want to stick with one known technology like ethernet. But for the highest possible performance infiniband is pretty fantastic. Both are quite expensive, but ethernet may have the edge on cost.

hookenz
  • 14,132
  • 22
  • 86
  • 142