Hadoop, mail # general - Transfer blocks from one datanode to another

Adrian 2012-09-06, 09:42
Re: Transfer blocks from one datanode to another
Harsh J 2012-09-06, 13:05

Please do not use the general@ lists for development/usage questions.
This list is meant for project-level discussions alone. Thanks! :)

I've moved this mail to [EMAIL PROTECTED]. When replying,
please instead use this list, going forward.

My reply inline:

On Thu, Sep 6, 2012 at 3:12 PM, Adrian (Xinyu) Liu <[EMAIL PROTECTED]> wrote:
> Hi All,
> Nowadays, I am working with HDFS and implementing some functionalities base on HDFS API.
> As I knew, one specific file is divided into several blocks and distributed into different datanode with certain replication numbers.
> And I want to find out a series of HDFS API which can meet the following requirement:
> 1.       Given a specific filename and related information that already uploaded into the HDFS, retrieve: how many blocks are there,
> each datanode contain which blocks, etc.

This isn't possible to get if you're using simple Public APIs.

The FileSystem#getFileBlockLocations will tell you what hosts are
carrying the blocks of a file (a list of hosts for each block in the
file). See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

For the list of block IDs, you'd have to pull from a DFSClient
instance, which calls the (NameNode-side) ClientProtocol's
getBlockLocations(…) method call. See the interface at

> 2.       Given a specific filename, source datanode, specific block id, destination datanode and related information to transfer the block
> from source node to destination node.

This needs to be done via the DataTransferProtocol, and its specific
method of replaceBlock(…). See the interface at

> I've read several materials and API reference about HDFS and can't find appropriate ones, the objective of this mail is to make sure
> if such  API existed, and if so, what are they (especially the second one, transfer a specific block of a specific file from a certain datanode to another)
> There is a tool called Balancer already existed in HDFS package, I am reading the source code, but it's too intricate to track the line, can anyone help me?

In the Balancer sources, see the final replaceBlock(…) call made at
L376 at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java?view=markup,
and then trace backwards from that point to see how its built up till
that point.

Feel free to send across any more questions you have!

Harsh J