Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Regarding moving specific blocks of data in HDFS


Copy link to this message
-
Re: Regarding moving specific blocks of data in HDFS
Hi Karthiek,

I haven't checked 1.0.4, but in 2.2.0 and onwards, there's this setting you
can tweak up:

dfs.datanode.balance.bandwidthPerSec

By default, it's set to just 1MB/s, which is pretty slow. Again at least in
2.2.0, there's also `hdfs dfsadmin -setBalancerBandwidth` which can be used
to adjust this config property at runtime.

Best,
Andrew
On Wed, Dec 18, 2013 at 2:40 PM, Karthiek C <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I am working on a research project where we are looking at algorithms to
> "optimally" distribute data blocks in HDFS nodes. The definition of what is
> optimal is omitted for brevity.
>
> I want to move specific blocks of a file that is *already* in HDFS. I am
> able to achieve it using data transfer protocol (took cues from "Balancer"
> module). But the operation turns out to be very time consuming. In my
> cluster setup, to move 1 block of data (approximately 60 MB) from
> data-node-1 to data-node-2 it takes nearly 60 seconds. A "dfs -put"
> operation that copies the same file from data-node-1's local file system to
> data-node-2 takes just 1.4 seconds.
>
> Any suggestions on how to speed up the movement of specific blocks?
> Bringing down the running time is very important for us because this
> operation may happen while executing a job.
>
> I am using hadoop-1.0.4 version.
>
> Thanks in advance!
>
> Best,
> Karthiek
>