|
|
-
can local disk of reduce task cause the job to fail?Majid Azimi 2012-12-09, 12:09
Hi guys,
Hadoop the definitive guide says: reduce tasks will start only when all maps has done their work. Also this link<http://hadoop.apache.org/docs/mapreduce/current/mapred_tutorial.html#Reducer>says: >> The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged. What I have understood is that when a reducer task starts then all data it needs(including a key and associated values) have been transferred to its local node. Am I right? if this is true then, the node running reduce task must have enough storage to hold all values associated with that key, else The job will fail. If no, then reduce job starts with some available data and shuffle + sort phase feed reduce task contiguously, thus low storage on node does not cause problem because data is coming on demand. which of the two cases actually happen? +
Mohit Anchlia 2012-12-09, 17:15
+
jamal sasha 2012-12-09, 17:19
|