Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> can local disk of reduce task cause the job to fail?


Copy link to this message
-
can local disk of reduce task cause the job to fail?
Hi guys,

Hadoop the definitive guide says: reduce tasks will start only when all
maps has done their work.  Also this
link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says:

>> The shuffle and sort phases occur simultaneously; while map-outputs are
being fetched they are merged.

What I have understood is that when a reducer task starts then all data it
needs(including a key and associated values) have been transferred to its
local node. Am I right? if this is true then, the node running reduce task
must have enough storage to hold all values associated with that key, else
The job will fail.

If no, then reduce job starts with some available data and shuffle + sort
phase feed reduce task contiguously, thus low storage on node does not
cause problem because data is coming on demand.

which of the two cases actually happen?
+
Mohit Anchlia 2012-12-09, 17:15
+
jamal sasha 2012-12-09, 17:19
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB