-Re: Regarding moving specific blocks of data in HDFS
Karthiek C 2013-12-18, 23:59
Thank you for the quick response. I changed the bandwidth using
-setBalancerBandwidth" command and it works like a charm! Time to transfer
data is now proportional to the bandwidth I set.
On Wed, Dec 18, 2013 at 6:23 PM, Andrew Wang <[EMAIL PROTECTED]>wrote:
> Hi Karthiek,
> I haven't checked 1.0.4, but in 2.2.0 and onwards, there's this setting you
> can tweak up:
> By default, it's set to just 1MB/s, which is pretty slow. Again at least in
> 2.2.0, there's also `hdfs dfsadmin -setBalancerBandwidth` which can be used
> to adjust this config property at runtime.
> On Wed, Dec 18, 2013 at 2:40 PM, Karthiek C <[EMAIL PROTECTED]> wrote:
> > Hi all,
> > I am working on a research project where we are looking at algorithms to
> > "optimally" distribute data blocks in HDFS nodes. The definition of what
> > optimal is omitted for brevity.
> > I want to move specific blocks of a file that is *already* in HDFS. I am
> > able to achieve it using data transfer protocol (took cues from
> > module). But the operation turns out to be very time consuming. In my
> > cluster setup, to move 1 block of data (approximately 60 MB) from
> > data-node-1 to data-node-2 it takes nearly 60 seconds. A "dfs -put"
> > operation that copies the same file from data-node-1's local file system
> > data-node-2 takes just 1.4 seconds.
> > Any suggestions on how to speed up the movement of specific blocks?
> > Bringing down the running time is very important for us because this
> > operation may happen while executing a job.
> > I am using hadoop-1.0.4 version.
> > Thanks in advance!
> > Best,
> > Karthiek