Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: store file gives exception


Copy link to this message
-
Re: store file gives exception
Nitin is right. The hadoop Job tracker will schedule a job based on the
data block location and the computing power of the node.

Based on the number of data blocks, the job tracker will split a job into
map tasks. Optimally, map tasks should be scheduled on nodes with local
data. And also because one data block might be replicated on multiple
nodes, the job tracker will schedule a task for a data block using some
rules (such as graylist, scheduler etc.)

BTW, if you do want to check the locations of the data blocks on HDFS, you
can use the following command:

*hadoop fsck /user/ec2-user/randtext2/part-00000 -files -blocks -locations*

And the output should be similar to :
FSCK started by ec2-user from /10.147.166.55 for path
/user/ec2-user/randtext2/part-00000 at Wed Mar 06 10:32:51 EST 2013
/user/ec2-user/randtext2/part-00000 1102234512 bytes, 17 block(s):  OK
0. blk_-1304750065421421106_1311 len=67108864 repl=2 [10.145.223.184:50010,
10.152.166.137:50010]
1. blk_-2917797815235442294_1315 len=67108864 repl=2 [10.145.231.46:50010,
10.152.166.137:50010]

Shumin-

On Wed, Mar 6, 2013 at 7:35 AM, Nitin Pawar <[EMAIL PROTECTED]> wrote:

> in hadoop you don't have to worry about data locality. Hadoop job tracker
> will by default try to schedule the job where the data is located in case
> it has enough compute capacity. Also note that datanode just store the
> blocks of file and multiple datanodes will have different blocks of the
> file.
>
>
> On Wed, Mar 6, 2013 at 5:52 PM, AMARNATH, Balachandar <
> [EMAIL PROTECTED]> wrote:
>
>> Hi all,****
>>
>> ** **
>>
>> I thought the below issue is coming because of non availability of enough
>> space. Hence, I replaced the datanodes with other nodes with more space and
>> it worked. ****
>>
>> ** **
>>
>> Now, I have a working HDFS cluster. I am thinking of my application where
>> I need to execute ‘a set of similar instructions’  (job) over large number
>> of files. I am planning to do this in parallel in different machines. I
>> would like to schedule this job to the datanode that already has data input
>> file in it. At first, I shall store the files in HDFS.  Now, to complete my
>> task, Is there a scheduler available in hadoop framework that given the
>> input file required for a job, can return the data node name where the file
>> is actually stored?  Am I making sense here?****
>>
>> ** **
>>
>> Regards****
>>
>> Bala ****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* AMARNATH, Balachandar [mailto:[EMAIL PROTECTED]]
>> *Sent:* 06 March 2013 16:49
>> *To:* [EMAIL PROTECTED]
>> *Subject:* RE: store file gives exception****
>>
>> ** **
>>
>> Hi, ****
>>
>> ** **
>>
>> I could successfully install hadoop cluster with three nodes (2 datanodes
>> and 1 namenode). However, when I tried to store a file, I get the following
>> error.****
>>
>> ** **
>>
>> 13/03/06 16:45:56 WARN hdfs.DFSClient: Error Recovery for block null bad
>> datanode[0] nodes == null****
>>
>> 13/03/06 16:45:56 WARN hdfs.DFSClient: Could not get block locations.
>> Source file "/user/bala/kumki/hosts" - Aborting...****
>>
>> put: java.io.IOException: File /user/bala/kumki/hosts could only be
>> replicated to 0 nodes, instead of 1****
>>
>> 13/03/06 16:45:56 ERROR hdfs.DFSClient: Exception closing file
>> /user/bala/kumki/hosts : org.apache.hadoop.ipc.RemoteException:
>> java.io.IOException: File /user/bala/kumki/hosts could only be replicated
>> to 0 nodes, instead of 1****
>>
>>             at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
>> ****
>>
>>             at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
>> ****
>>
>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> ****
>>
>>             at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> ****
>>
>>             at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB