Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> doubt about reduce tasks and block writes


Copy link to this message
-
Re: doubt about reduce tasks and block writes
Assuming that node A only contains replica, there is no garante that its
data would never be read.
First, you might lose a replica. The copy inside the node A could be used
to create the missing replica again.
Second, data locality is on best effort. If all the map slots are occupied
except one on one node without a replica of the data then your node A is as
likely as any other to be chosen as a source.

Regards

Bertrand

On Fri, Aug 24, 2012 at 10:09 PM, Marc Sturlese <[EMAIL PROTECTED]>wrote:

> Hey there,
> I have a doubt about reduce tasks and block writes. Do a reduce task always
> first write to hdfs in the node where they it is placed? (and then these
> blocks would be replicated to other nodes)
> In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and
> one
> (node A) just run DN, when running MR jobs, map tasks would never read from
> node A? This would be because maps have data locality and if the reduce
> tasks write first to the node where they live, one replica of the block
> would always be in a node that has a TT. Node A would just contain blocks
> created from replication by the framework as no reduce task would write
> there directly. Is this correct?
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/doubt-about-reduce-tasks-and-block-writes-tp4003185.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

--
Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB