If you're looking for the online solution, Aaron's just posted a
working implementation of it at
For the offline or asynchronous disk balancer discussed by
https://issues.apache.org/jira/browse/HDFS-1312, if you want your tool
to be part of the upstream project, I'd encourage first posting your
design for vetting/comments followed by the implementation, so that
all finer points get covered. The offline tool is the easiest to
write, and can also exist in Python (outside of HDFS, hosted over some
GitHub repo perhaps) as it doesn't really have to work with the DN or
NN's protocol calls. Understanding the block data directory structure
(ls -l one of your dfs.data.dirs/dfs.datanode.data.dirs and follow)
should let you write one up easily.
On Wed, Apr 3, 2013 at 6:36 PM, Kevin Lyda <[EMAIL PROTECTED]> wrote:
> I've been following https://issues.apache.org/jira/browse/HDFS-1312
> and really need the balancing tool described therein. I'd be
> interested in writing it, but am not sure where to start. I'm more
> comfortable in Python, but I suspect it has a better chance of being
> integrated if I do it in Java.
> Is hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop the
> place to look for interfaces to manipulate the filesystem?
> Kevin Lyda
> Galway, Ireland
> US Citizen overseas? We can vote.
> Register now: http://www.votefromabroad.org/