Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> Transfer blocks from one datanode to another


+
Adrian 2012-09-06, 09:42
Copy link to this message
-
Re: Transfer blocks from one datanode to another
Hi,

Please do not use the general@ lists for development/usage questions.
This list is meant for project-level discussions alone. Thanks! :)

I've moved this mail to [EMAIL PROTECTED]. When replying,
please instead use this list, going forward.

My reply inline:

On Thu, Sep 6, 2012 at 3:12 PM, Adrian (Xinyu) Liu <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> Nowadays, I am working with HDFS and implementing some functionalities base on HDFS API.
> As I knew, one specific file is divided into several blocks and distributed into different datanode with certain replication numbers.
> And I want to find out a series of HDFS API which can meet the following requirement:
>
>
> 1.       Given a specific filename and related information that already uploaded into the HDFS, retrieve: how many blocks are there,
>
> each datanode contain which blocks, etc.

This isn't possible to get if you're using simple Public APIs.

The FileSystem#getFileBlockLocations will tell you what hosts are
carrying the blocks of a file (a list of hosts for each block in the
file). See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

For the list of block IDs, you'd have to pull from a DFSClient
instance, which calls the (NameNode-side) ClientProtocol's
getBlockLocations(…) method call. See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java?view=markup

> 2.       Given a specific filename, source datanode, specific block id, destination datanode and related information to transfer the block
>
> from source node to destination node.

This needs to be done via the DataTransferProtocol, and its specific
method of replaceBlock(…). See the interface at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java?view=markup

> I've read several materials and API reference about HDFS and can't find appropriate ones, the objective of this mail is to make sure
> if such  API existed, and if so, what are they (especially the second one, transfer a specific block of a specific file from a certain datanode to another)
>
> There is a tool called Balancer already existed in HDFS package, I am reading the source code, but it's too intricate to track the line, can anyone help me?

In the Balancer sources, see the final replaceBlock(…) call made at
L376 at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java?view=markup,
and then trace backwards from that point to see how its built up till
that point.

Feel free to send across any more questions you have!

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB