Regarding your question about a pluggable module to control placement of
data, try taking a look at the abstract class BlockPlacementPolicy and
BlockPlacementPolicyDefault, which is its default implementation.
On branch-1, you can find these classes
at src/hdfs/org/apache/hadoop/hdfs/server/namenode. On trunk, the package
structure is different, and these classes are
Best of luck with your research!
On Fri, Feb 22, 2013 at 11:17 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> There's no filesystem (i.e. client) level APIs to do this, but the
> Balancer tool of HDFS does exactly this. Reading its sources should
> let you understand what kinda calls you need to make to reuse the
> balancer protocol and achieve what you need.
> In trunk, the balancer is at
> HTH, and feel free to ask any relevant follow up questions.
> On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <[EMAIL PROTECTED]> wrote:
> > Hi,
> > Is there any APIs to move data blocks in HDFS from one node to another *
> > after* they have been added to HDFS? Also can we write some sort of
> > pluggable module (like scheduler) that controls how data gets placed in
> > hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't
> > any filesystem APIs available to do that.
> > PS: I am working on a research project where we want to investigate how
> > optimally place data in hadoop.
> > Thanks,
> > Karthiek
> Harsh J