I am using Hadoop 0.20.2 with an append r1056497 version. The question I have is related to balancing. I have a 5 datanode cluster and each node has 2 disks attached to it. The second disk was added when the first disk was reaching its capacity.
Now the scenario that I am facing is, when the new disk was added hadoop automatically moved over some data to the new disk. But over the time I notice that data is no longer being written to the second disk. I have also faced an issue on the datanode where the first disk had 100% utilization.
How can I overcome such scenario, is it not hadoop's job to balance the disk utilization between multiple disks on single datanode?
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for the short term to balance the disk utilization? The patch in the Jira, if applied to the version that I am using, will it break anything?
Thanks Divye Sheth On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <[EMAIL PROTECTED]> wrote:
I wont be in a position to fix that depending on HDFS-1804 as we are upgrading to CDH4 in the coming month. Just wanted a short term solution. I have read somewhere that manual movement of the blocks would help. Could some one guide me to the exact steps or precautions I should take while doing this? Data loss is a NO NO for me.
Thanks Divye Sheth On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
you can write a simple tool to move blocks peer to peer. I had such tool before, but I cannot find it now.
background: our cluster is not balanced, load balancer is very slow, so i wrote this tool to move blocks from one node to another node. On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <[EMAIL PROTECTED]> wrote:
It don't need any downtime. just like Balancer, but this tool move blocks peer to peer. you specified source node and destination node. then start. On Wed, Mar 5, 2014 at 5:12 PM, divye sheth <[EMAIL PROTECTED]> wrote: