-1

I'm trying to find common methods for calculating overall system availability when the app runs on a cluster and the app isn't fault tolerant. I.E. If any one node of the cluster is down, the app fails.

When incrementing linearly from 1, I can always multiply. E.g. from 1 node to 2 nodes, if each is rated 99.99% then the resulting 2-node cluster is roughly 99.98 (ignoring connectivity for now). Subsequent additional nodes are just simple math (I think).

How about from 256 nodes to 320? Or 256 to 128? Surely there's a better way than working the numbers from 1-x based on the rating of a single node? It's easy enough to do but hoping someone has a better approach.

Thanks.

SQLmojoe
  • 101
  • 2

1 Answers1

0

If all systems have the same reliability, the total reliabilty is easy to calculate: R=r^n with r the reliabilty of a single node and n the number of nodes.

So, with 99.99% and 250 nodes, you end up with ~97.5% and at 320 nodes, it's 96.85%

If it's not the same, it will be R=∏ri with (i=1..n), which is still easy to calculate in Excel.

Sven
  • 97,248
  • 13
  • 177
  • 225