Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - phases of Hadoop Jobs

Copy link to this message
phases of Hadoop Jobs
Nan Zhu 2011-09-19, 02:24
Hi, all

 recently, I was hit by a question, "how is a hadoop job divided into 2

In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
map and reduce, and for reduce, we further divided it into 3 stages,
shuffle, sort, and reduce, but in hadoop codes, I never think about
this question, I didn't see any variable members in JobInProgress class
to indicate this information,

and according to my understanding on the source code of hadoop, the reduce
tasks are unnecessarily started until all mappers are finished, in
constract, we can see the reduce tasks are in shuffle stage while there are
mappers which are still in running,
So how can I indicate the phase which the job is belonging to?

Nan Zhu
School of Electronic, Information and Electrical Engineering,229
Shanghai Jiao Tong University
800,Dongchuan Road,Shanghai,China