If you're going to find some "optimal" solution, you have to find out how much data can pass through your data busses, the optimal would be that the data arrives at some netcard, get read from the cpu and directly zipped to memory in a netcard.
Lets assume you got
- 2 (so they don't get confused by sending and receiving) 10GB ethernet cards with each 8 lanes of PCI express 4 (15GB/s for each) assuming you can get a MB that supports this.
- you invest in some CPU+MB that can actually support this, RyZen 5950X+x570
Then there is the memory throughput
- the above system offers around 54GB/s read/write or 48GB/s copy
- the ethernet drives might copy the received data to ram (via cache or not)
- most likely you haven't yet had the benefit of zero-copy sending but more likely sending will cause 3-6 copies.
- it is likely the same for receiving, but there the CPU has to read it at least once to zip it, if lucky directly in the cache, if not from memory, and the same when its zipped to cache, the NIC could copy it directly to its internal send buffer.
So best case, with the best NIC, user level drivers, always hit cache
- copy from NIC to cache without writeback to RAM
- zip from cache to cache
- copy from cache to NIC
- assuming 32 threads can feed it
You could get 10 GB/s throughput
If you do not have access to this lucky scenario you are more likely to be memory bandwidth limited as
- assumed 3 copies to move data to the application
- 2 copies to load to zip and save result
- 3 copies to send
- and a lot of cache writebacks to bind it all in darkness
Assuming 7 reads and 7 writes writes the limit should be around 54GB/s / 14 = 3.85 GB/s in best case. With fewer read/writes you quickly run into the NIC max speed.
So from here you can cut down on specs until your budget or needs are met.
I have not been able to locate any data for in memory multithreaded zipping.