Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Regarding moving specific blocks of data in HDFS


Copy link to this message
-
Re: Regarding moving specific blocks of data in HDFS
Hi Andrew,

Thank you for the quick response. I changed the bandwidth using
"hadoop dfsadmin
-setBalancerBandwidth" command and it works like a charm! Time to transfer
data is now proportional to the bandwidth I set.

Thanks again!

Best,
Karthiek
On Wed, Dec 18, 2013 at 6:23 PM, Andrew Wang <[EMAIL PROTECTED]>wrote:

> Hi Karthiek,
>
> I haven't checked 1.0.4, but in 2.2.0 and onwards, there's this setting you
> can tweak up:
>
> dfs.datanode.balance.bandwidthPerSec
>
> By default, it's set to just 1MB/s, which is pretty slow. Again at least in
> 2.2.0, there's also `hdfs dfsadmin -setBalancerBandwidth` which can be used
> to adjust this config property at runtime.
>
> Best,
> Andrew
>
>
> On Wed, Dec 18, 2013 at 2:40 PM, Karthiek C <[EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > I am working on a research project where we are looking at algorithms to
> > "optimally" distribute data blocks in HDFS nodes. The definition of what
> is
> > optimal is omitted for brevity.
> >
> > I want to move specific blocks of a file that is *already* in HDFS. I am
> > able to achieve it using data transfer protocol (took cues from
> "Balancer"
> > module). But the operation turns out to be very time consuming. In my
> > cluster setup, to move 1 block of data (approximately 60 MB) from
> > data-node-1 to data-node-2 it takes nearly 60 seconds. A "dfs -put"
> > operation that copies the same file from data-node-1's local file system
> to
> > data-node-2 takes just 1.4 seconds.
> >
> > Any suggestions on how to speed up the movement of specific blocks?
> > Bringing down the running time is very important for us because this
> > operation may happen while executing a job.
> >
> > I am using hadoop-1.0.4 version.
> >
> > Thanks in advance!
> >
> > Best,
> > Karthiek
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB