-Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Michel Segel 2013-06-18, 10:11
I think your cluster is too small for any meaningful conclusions to be made.
Sent from a remote device. Please excuse any typos...
On Jun 18, 2013, at 3:58 AM, sam liu <[EMAIL PROTECTED]> wrote:
> Hi Harsh,
> Thanks for your detailed response! Now, the efficiency of my Yarn cluster improved a lot after increasing the reducer number(mapreduce.job.reduces) in mapred-site.xml. But I still have some questions about the way of Yarn to execute MRv1 job:
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a job?
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> Thanks as always!
> 2013/6/9 Harsh J <[EMAIL PROTECTED]>
>> Hi Sam,
>> > - How to know the container number? Why you say it will be 22 containers due to a 22 GB memory?
>> The MR2's default configuration requests 1 GB resource each for Map
>> and Reduce containers. It requests 1.5 GB for the AM container that
>> runs the job, additionally. This is tunable using the properties
>> Sandy's mentioned in his post.
>> > - My machine has 32 GB memory, how many memory is proper to be assigned to containers?
>> This is a general question. You may use the same process you took to
>> decide optimal number of slots in MR1 to decide this here. Every
>> container is a new JVM, and you're limited by the CPUs you have there
>> (if not the memory). Either increase memory requests from jobs, to
>> lower # of concurrent containers at a given time (runtime change), or
>> lower NM's published memory resources to control the same (config
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be 'yarn', will other parameters for mapred-site.xml still work in yarn framework? Like 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>> Yes, all of these properties will still work. Old properties specific
>> to JobTracker or TaskTracker (usually found as a keyword in the config
>> name) will not apply anymore.
>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <[EMAIL PROTECTED]> wrote:
>> > Hi Harsh,
>> > According to above suggestions, I removed the duplication of setting, and
>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>> > efficiency improved about 18%. I have questions:
>> > - How to know the container number? Why you say it will be 22 containers due
>> > to a 22 GB memory?
>> > - My machine has 32 GB memory, how many memory is proper to be assigned to
>> > containers?
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be 'yarn', will
>> > other parameters for mapred-site.xml still work in yarn framework? Like
>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>> > Thanks!
>> > 2013/6/8 Harsh J <[EMAIL PROTECTED]>