I'm designing a cluster for a small research institute. Since our computations require a large amount of memory, I'm looking for a solution that will allow our applications access to the whole memory distributed across different nodes. The access has to be "transparent", since we don't want to modify programs we are using, so solutions like RDMA are excluded. For this reason also transparent access to other resources, like GPGPUs, storage, I/O and CPUs at different nodes would be desired.
I know there are hardware implementations connecting nodes directly via UPI links between CPUs, like HPE Superdome and Atos BullSequana. There are also software solutions implementing virtualization for aggregation, like ScaleMP and TidalScale that connect nodes using ordinary Ethernet interconnect plus some AI memory use prediction for performance improvement.
A similar question has been asked here some time ago, viz. Alternative to ScaleMP?, but it seems that the market has drastically changed since that time.
I have two questions:
- What is the performance difference between hardware and software solutions, especially in terms of latencies in memory access?
- Are there currently available any other hardware or software solutions providing the described functionality?