Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> can local disk of reduce task cause the job to fail?


+
Majid Azimi 2012-12-09, 12:09
+
Mohit Anchlia 2012-12-09, 17:15
Copy link to this message
-
Re: can local disk of reduce task cause the job to fail?
I am new to hadoop but I think the data transfer from the completed mapped
nodes are transferred (copied,.. shuffled and sorted ) to the reducer nodes
even though some of the mappers are still running. but the code execution
strts only when al the mapper phases have finished.
thats why you see some small percentage of reducer being showed to be
completed even though mappers are still running
On Sun, Dec 9, 2012 at 12:15 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Reducer will not start executing until shuffle and sort phase is complete
>
> Sent from my iPhone
>
> On Dec 9, 2012, at 4:09 AM, Majid Azimi <[EMAIL PROTECTED]> wrote:
>
> Hi guys,
>
> Hadoop the definitive guide says: reduce tasks will start only when all
> maps has done their work.  Also this link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says:
>
> >> The shuffle and sort phases occur simultaneously; while map-outputs are
> being fetched they are merged.
>
> What I have understood is that when a reducer task starts then all data it
> needs(including a key and associated values) have been transferred to its
> local node. Am I right? if this is true then, the node running reduce
> task must have enough storage to hold all values associated with that
> key, else The job will fail.
>
> If no, then reduce job starts with some available data and shuffle + sort
> phase feed reduce task contiguously, thus low storage on node does not
> cause problem because data is coming on demand.
>
> which of the two cases actually happen?
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB