In previous discussions, I found following descriptions:
"mapreduce.job.ubertask.enable | (false) | 'Whether to enable the
small-jobs "ubertask" optimization, which runs "sufficiently small" jobs
sequentially within a single JVM. "Small" is defined by the following
maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
Basing on above description, I set "mapreduce.job.ubertask.enable" to true
and also configured other uber related parameters, and then I did some
practices and have following understanding.
1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will
run in uber mode):
- Each MR job corresponds to an application, like
- Each application has its own container, like
- When a container launched by nodemanager, it will launch a JVM too.
When the container stops, the JVM will stop as well. A container only has
one JVM in its whole life cycle.
- Each application_1383815949546_0006 includes some map tasks and reduce
- In uber mode, all the map tasks and reduce tasks of
application_1383815949546_0006 will be executed in a the same and only
container container_1383815949546_0010_01_000001. It also means that all
map tasks and reduce tasks will be executed in a single JVM.
- A container could not be shared among different applications(jobs)
2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will
run and NOT in uber mode):
- Each map task and reduce task of application_1383815949546_0006 will
be executed in its own container. It means that
application_1383815949546_0006 will have lots of containers.
I am not sure whether above undertandings are correct or not, so any
comments/corrections will be appreciated!