>1) I am not sure that whether I should start the rebalance on the namenode or on each new datanode.
You can run the balancer in any node. It is not suggested to run in namenode and would be better to run in a node which has less load.
>2) should I set the bandwidth on each datanode or just only on the namenode
Each data node has a limited bandwidth for rebalancing. The default value for the
bandwidth is 5MB/s.
>3) If the rebalance started, whether the data on others' would be decreased
Yes, after the balancer run, data will be moved from over utilized nodes to under utilized nodes.
>4)whether the log details means the balancer was killed by another one.
We cannot run multiple balancers at a time. It is allowed to run only one balancer at any time in the cluster to avoid data corruption.
You can refer the below document fot more details.
From: yingnan.ma [[EMAIL PROTECTED]]
Sent: Wednesday, May 30, 2012 7:06 AM
Subject: about rebalance
I add 5 new datanode and I want to do the rebalance, and I started the rebalance on the namenode, and it gave me the notice that
"starting balancer, logging to /hadoop/logs/hadoop-hdfs-balancer-hadoop220.out "
and today I check the log file and the detail is that
Another balancer is running. Exiting...
Balancing took 5.0203 minutes
1) I am not sure that whether I should start the rebalance on the namenode or on each new datanode.
2) should I set the bandwidth on each datanode or just only on the namenode
3) If the rebalance started, whether the data on others' would be decreased
4)whether the log details means the balancer was killed by another one.
If you have some suggestion, please give me some notice , thank you
E [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]