Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - doubt about reduce tasks and block writes


Copy link to this message
-
Re: doubt about reduce tasks and block writes
Raj Vishwanathan 2012-08-26, 17:13
Harsh

I did leave an escape route open witth a bit about "corner cases" :-) 

Anyway I agree that HDFS has no notion of block 0. I just meant that had the dfs.replication is 1, there will be,under normal circumstances :-), no blocks of output file will be written to node A.

Raj
----- Original Message -----
> From: Harsh J <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Raj Vishwanathan <[EMAIL PROTECTED]>
> Cc:
> Sent: Saturday, August 25, 2012 4:02 AM
> Subject: Re: doubt about reduce tasks and block writes
>
> Raj's almost right. In times of high load or space fillup on a local
> DN, the NameNode may decide to instead pick a non-local DN for
> replica-writing. In this way, the Node A may get a "copy 0" of a
> replica from a task. This is per the default block placement policy.
>
> P.s. Note that HDFS hardly makes any differences between replicas,
> hence there is no hard-concept of a "copy 0" or "copy 1"
> block, at the
> NN level, it treats all DNs in pipeline equally and same for replicas.
>
> On Sat, Aug 25, 2012 at 4:14 AM, Raj Vishwanathan <[EMAIL PROTECTED]>
> wrote:
>>  But since node A has no TT running, it will not run map or reduce tasks.
> When the reducer node writes the output file, the fist block will be written on
> the local node and never on node A.
>>
>>  So, to answer the question, Node A will contain copies of blocks of all
> output files. It wont contain the copy 0 of any output file.
>>
>>
>>  I am reasonably sure about this , but there could be corner cases in case
> of node failure and such like! I need to look into the code.
>>
>>
>>  Raj
>>> ________________________________
>>>  From: Marc Sturlese <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]
>>> Sent: Friday, August 24, 2012 1:09 PM
>>> Subject: doubt about reduce tasks and block writes
>>>
>>> Hey there,
>>> I have a doubt about reduce tasks and block writes. Do a reduce task
> always
>>> first write to hdfs in the node where they it is placed? (and then these
>>> blocks would be replicated to other nodes)
>>> In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and
> one
>>> (node A) just run DN, when running MR jobs, map tasks would never read
> from
>>> node A? This would be because maps have data locality and if the reduce
>>> tasks write first to the node where they live, one replica of the block
>>> would always be in a node that has a TT. Node A would just contain
> blocks
>>> created from replication by the framework as no reduce task would write
>>> there directly. Is this correct?
>>> Thanks in advance
>>>
>>>
>>>
>>> --
>>> View this message in context:
> http://lucene.472066.n3.nabble.com/doubt-about-reduce-tasks-and-block-writes-tp4003185.html
>>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>>
>>>
>>>
>
>
>
> --
> Harsh J
>