I have nutch/hadoop pseudo distributed running fine. I want to add processing capacity by adding new nodes which are smaller than master (HD 3 times smaller) and cheaper of course.
Since the default HDFS replication is at 3, after balancing the data I will not get more space, which is not my concern first.
Do I still get more processing power ?
I don't understand how map/reduce tasks work against replication. How is it decided which nodes gets the work out of the different replica.