-Re: can local disk of reduce task cause the job to fail?
Mohit Anchlia 2012-12-09, 17:15
Reducer will not start executing until shuffle and sort phase is complete
Sent from my iPhone
On Dec 9, 2012, at 4:09 AM, Majid Azimi <[EMAIL PROTECTED]> wrote:
> Hi guys,
> Hadoop the definitive guide says: reduce tasks will start only when all maps has done their work. Also this link says:
> >> The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
> What I have understood is that when a reducer task starts then all data it needs(including a key and associated values) have been transferred to its local node. Am I right? if this is true then, the node running reduce task must have enough storage to hold all values associated with that key, else The job will fail.
> If no, then reduce job starts with some available data and shuffle + sort phase feed reduce task contiguously, thus low storage on node does not cause problem because data is coming on demand.
> which of the two cases actually happen?