Short: NLB doesn't care.
NLB relies on all the nodes seeing all the incoming traffic, then each node drops any traffic it's not interested in. That's before the app gets to even see the traffic - it's not based on app health, not based on response time, not based on fairness or queueing.
NLB doesn't give a rat's donkey about your application. User-mode problems are so user-mode!
As long as the network stack is able to send and receive NLB broadcasts (or multicasts!), NLB will keep on accepting traffic.
Longer: NLB really doesn't care.
NLB runs as a Layer 2 NDIS filter, and simply runs its hash algorithm across all incoming traffic, and accepts the bits that match its port rules.
Every node must see all incoming traffic (multicast/broadcast by the switch), and every node drops packets that don't meet its hash criteria.
NLB is a statistical load balancer. Not a caring one.
- It's not important to it that your app is throwing 500s.
- It's oblivious to the fact that your user-mode listener process has crashed.
- It's unaware that your app is running slowly.
It just. Doesn't. Care.
The best-case failure for NLB is for (one of):
- the box to die completely from a power failure
- the network cable to be cut or unplugged
- (possibly by a vacuum cleaner)
- (or the NIC exploding in a shower of sparks)
- (or being stolen by a passing thief)
- (or any form of physical network interruption)
- the box to bluescreen
Any of these stop the network stack from processing incoming packets, and from sending "I'm alive!" broadcast messages to other NLB nodes, which causes the cluster to get all introspective for a little while (seconds), figure out what nodes are still present, and re-converge.
Then, when the remaining nodes are clear on how many of them there are, they'll start their hashing again, and drop any new packets they're not interested in.
If you need
- health monitoring,
- careful application of load to underutilized servers,
- and response-time-based intelligent decisions
NLB is not the solution for you. If your app is OK with that, or knows enough about NLB to run NLB STOP when there's a problem at the app layer, then it's probably fine. But very few apps do that (ISA/TMG spring to mind).
Looks like I wrote about this a while ago here.
Looking Elsewhere
If you're looking for a low-cost (read: free) Windows-based solution, consider Application Request Routing for IIS 7+ - it has most of the health monitoring features it sounds like you're looking for.
You wouldn't run it on the same box as the applications, generally, though, and for availability, you'd typically want to run NLB underneath ARR, to combine network-level availability of the load balancers with app-layer smarts at that level.