A cluster contains 1500 machines each having 500GB disk capacity. Calculate approximate number of chunk servers, blocks and total available size if default chunk replica is 3 and 5 respectively?
We have 1500 machines-:
1 master
1 shadow master
So number of chunkservers=1500-1-1=1498
Total disk capacity=1498*500 GB
But what's the number of blocks?
And how is default chunk replica important to this?
Because
chunk server=collection of chunks(64 MB)
chunks=collection of blocks (64 KB)
So in 1 machine-:
Number of chunks=500GB/64MB=8000 chunks
Number of blocks in 1 chunk= 64MB/64KB=1024
So total number of blocks in 1 chunkserver=8000*1024
Total number of blocks in 1498 chunkserver=149880001024
Why is replication goal important here? Or is it asking any different "block" that I'm not aware of?
This is what I got feedback from professor about this question-:
Chunk Size = 64MB
if chunk replica is 3 then,
Total blocks = 749000 GB/ (3*64 MB)
If chunk replica is 5 then, Total blocks = 749000 GB/ (5*64 MB)