Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Sam,
I think your cluster is too small for any meaningful conclusions to be made.
Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 18, 2013, at 3:58 AM, sam liu <[EMAIL PROTECTED]> wrote:

> Hi Harsh,
>
> Thanks for your detailed response! Now, the efficiency of my Yarn cluster improved a lot after increasing the reducer number(mapreduce.job.reduces) in mapred-site.xml. But I still have some questions about the way of Yarn to execute MRv1 job:
>
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a job?
>
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>
> Thanks as always!
>
> 2013/6/9 Harsh J <[EMAIL PROTECTED]>
>> Hi Sam,
>>
>> > - How to know the container number? Why you say it will be 22 containers due to a 22 GB memory?
>>
>> The MR2's default configuration requests 1 GB resource each for Map
>> and Reduce containers. It requests 1.5 GB for the AM container that
>> runs the job, additionally. This is tunable using the properties
>> Sandy's mentioned in his post.
>>
>> > - My machine has 32 GB memory, how many memory is proper to be assigned to containers?
>>
>> This is a general question. You may use the same process you took to
>> decide optimal number of slots in MR1 to decide this here. Every
>> container is a new JVM, and you're limited by the CPUs you have there
>> (if not the memory). Either increase memory requests from jobs, to
>> lower # of concurrent containers at a given time (runtime change), or
>> lower NM's published memory resources to control the same (config
>> change).
>>
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be 'yarn', will other parameters for mapred-site.xml still work in yarn framework? Like 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>
>> Yes, all of these properties will still work. Old properties specific
>> to JobTracker or TaskTracker (usually found as a keyword in the config
>> name) will not apply anymore.
>>
>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <[EMAIL PROTECTED]> wrote:
>> > Hi Harsh,
>> >
>> > According to above suggestions, I removed the duplication of setting, and
>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>> > efficiency improved about 18%.  I have questions:
>> >
>> > - How to know the container number? Why you say it will be 22 containers due
>> > to a 22 GB memory?
>> > - My machine has 32 GB memory, how many memory is proper to be assigned to
>> > containers?
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be 'yarn', will
>> > other parameters for mapred-site.xml still work in yarn framework? Like
>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>> >
>> > Thanks!
>> >
>> >
>> >
>> > 2013/6/8 Harsh J <[EMAIL PROTECTED]>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB