0

we have Hadoop version - 2.6.4

On the datanode machine we can see that hdfs data isn’t balanced

On some disks we have different used size as sdb 11G and sdd 17G

/dev/sdd 20G 3.0G 17G 15%   /grid/sdd 
/dev/sdb 20G 11G 9.3G 53%   /grid/sdb <-- WHY DISK DISK NOT BALANCED AS SDD DISK , WHY DISKS ARE DIFF USED SIZE!!!

After searching in google I found the following CLI ( from https://community.hortonworks.com/questions/19694/help-with-exception-from-hdfs-balancer.html )

hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log

and after I run it we get the same hdfs size

/dev/sdd 20G 3.0G 17G 15% /grid/sdd 
/dev/sdb 20G 11G 9.3G 53% /grid/sdb


more /tmp/balancer-out.log Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved The cluster is balanced. Exiting... Mar 7, 2019 5:02:34 PM 0 0 B 0 B 0 B Mar 7, 2019 5:02:34 PM Balancing took 1.453 seconds

So actually we not get balanced in hdfs

Please advise , how to balance the hdfs data so all disk will be with the same used size

shalom
  • 451
  • 12
  • 26

1 Answers1

0

NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:

1. Policy to keep one of the replicas of a block on the same node as the node that is writing the block.
2. Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack.
3. One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.
4. Spread HDFS data uniformly across the DataNodes in the cluster.

So in your case might be some of the above matching.

Apache Balancer command.

hdfs balancer [-threshold <threshold>] [-policy <policy>]
 -- threshold *threshold* Percentage of disk capacity. This overwrites the default threshold.
 -- policy *policy* *datanode* (default): Cluster is balanced if each datanode is balanced.  
                    *blockpool*: Cluster is balanced if each block pool in each datanode is balanced.
asktyagi
  • 2,401
  • 1
  • 5
  • 19