Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Accessing list of blocks on a datanode via Java interface

Yaron Gonen 2012-07-06, 14:28
Harsh J 2012-07-06, 14:58
Yaron Gonen 2012-07-06, 17:53
Copy link to this message
Re: Accessing list of blocks on a datanode via Java interface
Harsh J 2012-07-06, 18:06
Does HDFS's replication feature not do this automatically and more
effectively for you?

I think for backups you should look at the DistCp tool, which backup
at proper file-levels rather than granular block level copies. It can
do incremental copies too, AFAICT.

In any case, if you wish to have a list of all blocks at each DN,
either parse out the info returned via "dfsadmin -metasave", "fsck
-files -blocks -locations", or ls -lR the DN's data dir.

On Fri, Jul 6, 2012 at 11:23 PM, Yaron Gonen <[EMAIL PROTECTED]> wrote:
> Thanks for the fast reply.
> My top goal is to backup any new blocks on the DN.
> What i'd like to do is to go over all the blocks in the DN and to make a
> signature for any one of them. I'll compare that signature with a backup
> server.
> I guess another feature will be to check only new blocks, so i'll have to
> look at the metadata of each block.
> On Jul 6, 2012 5:59 PM, "Harsh J" <[EMAIL PROTECTED]> wrote:
>> When you say 'scan blocks on that datanode', what do you mean to do by
>> 'scan'? If you want merely a list of blocks per DN at a given time,
>> there are ways to get that. However, if you want to then perform
>> operations on each of these block remotely, then thats not possible to
>> do.
>> In any case, you can run whatever program you wish to agnostically on
>> any DN by running it on the dfs.datanode.data.dir directories of the
>> DN (take it from its config), and visiting all files with the format
>> ^blk_<ID number>$.
>> We can help you better if you tell us what exactly are you attempting
>> to do, for which you need a list of all the blocks per DN.
>> On Fri, Jul 6, 2012 at 7:58 PM, Yaron Gonen <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> > I'm trying to write an agent that will run on a datanode and will scan
>> > blocks on a that datanode.
>> > The logical thing to do is to look in the DataBlockScanner code, which
>> > lists
>> > all the blocks on a node, which is what I did.
>> > The problem is that the DataBlockScanner object is instantiated during
>> > the
>> > start-up of a DataNode, so a lot of objects needed (like FSDataSet) are
>> > already instantiated.
>> > Then, I tried with DataNode.getDataNode(), but it returned null
>> > (needless to
>> > say that the node is up-and-running).
>> > I'd be grateful if you can refer me to the right object or to a a guide.
>> >
>> > I'm new in hdfs, so I'm sorry if its a trivial question.
>> >
>> > Thanks,
>> > Yaron
>> --
>> Harsh J

Harsh J
Yaron Gonen 2012-07-06, 18:58
Harsh J 2012-07-07, 01:50
Yaron Gonen 2012-07-07, 09:17