I encounter a strange performance problem on 2008 R2 Enterprise SP1.
Here is the setup:
- Many processes listening to distinct Multicast UDP streams (5 multicasts listen by process) bound on a single NIC
- Across processes,all multicasts using the same port range but different multicast IPs (important detail,since each multicast receiver for a given port will be server of REUSED server socket)
- Each process multicast listened bandwith is 10Mbits
- RSS set on NIC , maximum offload settings set on NIC & OS , MSI activated
Behaviour:
- Under 17 listening processes (about 85 joined UDP Multicasts), Kernel CPU impact is neglectible.
- Between 17 & 22 listeners (about 110 joined UDP multicasts) , Kernel CPU usage begins to grow slowly but is acceptable
- Above 25, each joined multicast begins to have huge impact in Kernel CPU time , this impacts all RSS bound CPUs
- CPU time used per listening process is near 0 (normal since processes do nothing but reading the multicast) , so the real problem lies in the OS component
What we found:
- Changing NIC hardware has no impact on behaviour (Tested on HP NC382i , Broadcom based NIC & HP NC365T , Quad Gigabit , Intel Based)
- Global receive bandwith is not the limiting factor (Single 500Mbits stream does not trigger CPU load)
- Reading on multicast socket seems not to be the limiting factor (we performed the test with just dumb JOIN only processes on the multicast streams and reproduced CPU load problem)
- Splitting Multicast traffic on two NICs seems to limit CPU load & spread better. However this is not a use case for us.
Problem:
- We need at least to be able to listen to about 500 multicast streams and maybe up to 750
- Same hardware, running XP OS does not have this behavior in CPU Kernel time
Supected Component:
- NDIS.sys seems to be a good candidate for explaining the CPU usage increase.
Have any of you encountered such problems and could give some direction to investigate. I've read all i could about win server 2008 network perf enhancement, but all seem to be linked to TCP traffic. I've also tested all possible optimizations that could be done via registry or netsh command.