I'm trying to compare latencies of different node interconnects for a cluster. The goal is to minimize the memory access latency.
I have obtained some benchmarks regarding one of the hardware implementations of NUMA architecture with many CPUs. This indicates that:
- The latency of memory access directly connected to the CPU's socket is about 90ns.
- The latency of memory access connected to other CPU's socket which is connected by UPI to the CPU's socket is about 140ns (so one "hop" of UPI adds about 50ns).
- The latency of memory access via the considered NUMA interconnect is 370ns (so one "hop" of this interconnect adds about 280ns).
NUMA interconnects are quite specialized solutions, not possible to be used with the majority of hardware vendors. "Standard" interconnectors are InfiniBand, Ethernet and FibreChannel.
I'm looking for the latencies these interconnectors provide for memory accesses.
For example in the specification of one of EDR Infiniband switches it states that it offers "90ns port-to-port latency". If I understand correctly, port-to-port latency refers to the one introduced by the switch itself. To this latency we should add the NIC latency that is about 600ns (according to this), so this is about 90+2x600=1290[ns] of interconnector-related latency. (BTW the value 600ns seems suspiciously high compared to 90ns. Why is it so high?)
We should also expect some latency to be introduced by cables (passive copper or optical fiber). I guess it depends on its length, but I'm not sure what is the order of it. Light travels 1 meter in around 3ns, is it a good estimate?
The missing part is the time to access memory by NIC. I guess we should consider separate cases with RDMA and via CPU. Am I missing something else? Is my above reasoning correct?
My major question is: What is the expected latency in accessing memory within a different node of a cluster using "standard" interconnectors like InfiniBand, Ethernet or FibreChannel?
The reason I'm asking is that I'm trying to decompose my problem described in Current single system image solutions to smaller sub-problems.