I'm trying to find common methods for calculating overall system availability when the app runs on a cluster and the app isn't fault tolerant. I.E. If any one node of the cluster is down, the app fails.
When incrementing linearly from 1, I can always multiply. E.g. from 1 node to 2 nodes, if each is rated 99.99% then the resulting 2-node cluster is roughly 99.98 (ignoring connectivity for now). Subsequent additional nodes are just simple math (I think).
How about from 256 nodes to 320? Or 256 to 128? Surely there's a better way than working the numbers from 1-x based on the rating of a single node? It's easy enough to do but hoping someone has a better approach.
Thanks.