-Re: Regarding moving specific blocks of data in HDFS
Andrew Wang 2013-12-18, 23:23
I haven't checked 1.0.4, but in 2.2.0 and onwards, there's this setting you
can tweak up:
By default, it's set to just 1MB/s, which is pretty slow. Again at least in
2.2.0, there's also `hdfs dfsadmin -setBalancerBandwidth` which can be used
to adjust this config property at runtime.
On Wed, Dec 18, 2013 at 2:40 PM, Karthiek C <[EMAIL PROTECTED]> wrote:
> Hi all,
> I am working on a research project where we are looking at algorithms to
> "optimally" distribute data blocks in HDFS nodes. The definition of what is
> optimal is omitted for brevity.
> I want to move specific blocks of a file that is *already* in HDFS. I am
> able to achieve it using data transfer protocol (took cues from "Balancer"
> module). But the operation turns out to be very time consuming. In my
> cluster setup, to move 1 block of data (approximately 60 MB) from
> data-node-1 to data-node-2 it takes nearly 60 seconds. A "dfs -put"
> operation that copies the same file from data-node-1's local file system to
> data-node-2 takes just 1.4 seconds.
> Any suggestions on how to speed up the movement of specific blocks?
> Bringing down the running time is very important for us because this
> operation may happen while executing a job.
> I am using hadoop-1.0.4 version.
> Thanks in advance!