|
|
-
APIs to move data blocks within HDFS
Karthiek C 2013-02-22, 18:13
Hi,
Is there any APIs to move data blocks in HDFS from one node to another * after* they have been added to HDFS? Also can we write some sort of pluggable module (like scheduler) that controls how data gets placed in hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't find any filesystem APIs available to do that.
PS: I am working on a research project where we want to investigate how to optimally place data in hadoop.
Thanks, Karthiek
-
Re: APIs to move data blocks within HDFS
Harsh J 2013-02-22, 19:17
There's no filesystem (i.e. client) level APIs to do this, but the Balancer tool of HDFS does exactly this. Reading its sources should let you understand what kinda calls you need to make to reuse the balancer protocol and achieve what you need.
In trunk, the balancer is at hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
HTH, and feel free to ask any relevant follow up questions.
On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <[EMAIL PROTECTED]> wrote: > Hi, > > Is there any APIs to move data blocks in HDFS from one node to another * > after* they have been added to HDFS? Also can we write some sort of > pluggable module (like scheduler) that controls how data gets placed in > hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't find > any filesystem APIs available to do that. > > PS: I am working on a research project where we want to investigate how to > optimally place data in hadoop. > > Thanks, > Karthiek
-- Harsh J
-
Re: APIs to move data blocks within HDFS
Chris Nauroth 2013-02-22, 19:46
Regarding your question about a pluggable module to control placement of data, try taking a look at the abstract class BlockPlacementPolicy and BlockPlacementPolicyDefault, which is its default implementation.
On branch-1, you can find these classes at src/hdfs/org/apache/hadoop/hdfs/server/namenode. On trunk, the package structure is different, and these classes are at hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement.
Best of luck with your research!
--Chris On Fri, Feb 22, 2013 at 11:17 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> There's no filesystem (i.e. client) level APIs to do this, but the > Balancer tool of HDFS does exactly this. Reading its sources should > let you understand what kinda calls you need to make to reuse the > balancer protocol and achieve what you need. > > In trunk, the balancer is at > > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java > > HTH, and feel free to ask any relevant follow up questions. > > On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Is there any APIs to move data blocks in HDFS from one node to another * > > after* they have been added to HDFS? Also can we write some sort of > > pluggable module (like scheduler) that controls how data gets placed in > > hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't > find > > any filesystem APIs available to do that. > > > > PS: I am working on a research project where we want to investigate how > to > > optimally place data in hadoop. > > > > Thanks, > > Karthiek > > > > -- > Harsh J >
-
Re: APIs to move data blocks within HDFS
Karthiek C 2013-02-22, 21:44
Thank you Harsh and Chris. This really helps!
-Karthiek
On Fri, Feb 22, 2013 at 2:46 PM, Chris Nauroth <[EMAIL PROTECTED]>wrote:
> Regarding your question about a pluggable module to control placement of > data, try taking a look at the abstract class BlockPlacementPolicy and > BlockPlacementPolicyDefault, which is its default implementation. > > On branch-1, you can find these classes > at src/hdfs/org/apache/hadoop/hdfs/server/namenode. On trunk, the package > structure is different, and these classes are > at > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement. > > Best of luck with your research! > > --Chris > > > On Fri, Feb 22, 2013 at 11:17 AM, Harsh J <[EMAIL PROTECTED]> wrote: > > > There's no filesystem (i.e. client) level APIs to do this, but the > > Balancer tool of HDFS does exactly this. Reading its sources should > > let you understand what kinda calls you need to make to reuse the > > balancer protocol and achieve what you need. > > > > In trunk, the balancer is at > > > > > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java > > > > HTH, and feel free to ask any relevant follow up questions. > > > > On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <[EMAIL PROTECTED]> > wrote: > > > Hi, > > > > > > Is there any APIs to move data blocks in HDFS from one node to another > * > > > after* they have been added to HDFS? Also can we write some sort of > > > pluggable module (like scheduler) that controls how data gets placed in > > > hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't > > find > > > any filesystem APIs available to do that. > > > > > > PS: I am working on a research project where we want to investigate how > > to > > > optimally place data in hadoop. > > > > > > Thanks, > > > Karthiek > > > > > > > > -- > > Harsh J > > >
|
|