Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Why my tests shows Yarn is worse than MRv1 for terasort?


+
sam liu 2013-06-07, 03:15
+
Sandy Ryza 2013-06-07, 06:53
+
Harsh J 2013-06-08, 15:09
+
sam liu 2013-06-09, 08:51
+
Harsh J 2013-06-09, 13:03
Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Hi Harsh,

Thanks for your detailed response! Now, the efficiency of my Yarn cluster
improved a lot after increasing the reducer number(mapreduce.job.reduces)
in mapred-site.xml. But I still have some questions about the way of Yarn
to execute MRv1 job:

1.In Hadoop 1.x, a job will be executed by map task and reduce task
together, with a typical process(map > shuffle > reduce). In Yarn, as I
know, a MRv1 job will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like
mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a
job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting
'mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses
container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using '
yarn.nodemanager.resource.memory-mb'. But how to set the default size of
physical mem of a container?
- How to set the maximum size of physical mem of a container? By the
parameter of 'mapred.child.java.opts'?

Thanks as always!

2013/6/9 Harsh J <[EMAIL PROTECTED]>

> Hi Sam,
>
> > - How to know the container number? Why you say it will be 22 containers
> due to a 22 GB memory?
>
> The MR2's default configuration requests 1 GB resource each for Map
> and Reduce containers. It requests 1.5 GB for the AM container that
> runs the job, additionally. This is tunable using the properties
> Sandy's mentioned in his post.
>
> > - My machine has 32 GB memory, how many memory is proper to be assigned
> to containers?
>
> This is a general question. You may use the same process you took to
> decide optimal number of slots in MR1 to decide this here. Every
> container is a new JVM, and you're limited by the CPUs you have there
> (if not the memory). Either increase memory requests from jobs, to
> lower # of concurrent containers at a given time (runtime change), or
> lower NM's published memory resources to control the same (config
> change).
>
> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be 'yarn',
> will other parameters for mapred-site.xml still work in yarn framework?
> Like 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>
> Yes, all of these properties will still work. Old properties specific
> to JobTracker or TaskTracker (usually found as a keyword in the config
> name) will not apply anymore.
>
> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <[EMAIL PROTECTED]> wrote:
> > Hi Harsh,
> >
> > According to above suggestions, I removed the duplication of setting, and
> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
> > 'yarn.nodemanager.vcores-pcores-ratio' and
> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
> > efficiency improved about 18%.  I have questions:
> >
> > - How to know the container number? Why you say it will be 22 containers
> due
> > to a 22 GB memory?
> > - My machine has 32 GB memory, how many memory is proper to be assigned
> to
> > containers?
> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be 'yarn',
> will
> > other parameters for mapred-site.xml still work in yarn framework? Like
> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
> >
> > Thanks!
> >
> >
> >
> > 2013/6/8 Harsh J <[EMAIL PROTECTED]>
> >>
> >> Hey Sam,
> >>
> >> Did you get a chance to retry with Sandy's suggestions? The config
> >> appears to be asking NMs to use roughly 22 total containers (as
> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
> >> resource. This could impact much, given the CPU is still the same for
+
Michel Segel 2013-06-18, 10:11
+
Sandy Ryza 2013-10-22, 23:45