Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - RE: How Yarn execute MRv1 job?


Copy link to this message
-
Re: How Yarn execute MRv1 job?
sam liu 2013-06-20, 06:11
Thanks Arun!

#1, Yes, I did tests and found that the MRv1 jobs could run against YARN
directly, without recompiling

#2, do you mean the old versions of HBase/Hive can not run agains YARN, and
only some special versions of them can run against YARN? If yes, how can I
get the versions for YARN?
2013/6/20 Arun C Murthy <[EMAIL PROTECTED]>

>
> On Jun 19, 2013, at 6:45 PM, sam liu <[EMAIL PROTECTED]> wrote:
>
> Appreciating for the detailed answers! Here are three further questions:
>
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
> If yarn does not ask existing MRv1 job to do any code change, but why we
> should recompile the MRv1 job?
>
>
> You don't need to recompile MRv1 jobs to run against YARN.
>
> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
> 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
> 1.1.1 with yarn, do we need to recompile all other Hadoop related
> components again with yarn jar files? Without any code change?
>
>
> You will need versions of HBase, Hive etc. which are integrated with
> hadoop-2.x, but not need to change any of your end-user applications (MR
> jobs, hive queries, pig scripts etc.)
>
> Arun
>
>
> Thanks in advance!
>
>
>
> 2013/6/19 Rahul Bhattacharjee <[EMAIL PROTECTED]>
>
>> Thanks Arun and Devraj , good to know.
>>
>>
>>
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <[EMAIL PROTECTED]>wrote:
>>
>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>
>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>> Hi Devaraj,
>>>
>>> As for the container request request for yarn container , currently only
>>> memory is considered as resource , not cpu. Please correct.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <[EMAIL PROTECTED]>wrote:
>>>
>>>>  Hi Sam,****
>>>>
>>>>   Please find the answers for your queries. ****
>>>>
>>>>
>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>
>>>> ** **
>>>>
>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>> application). If we want to run different kinds of applications we should
>>>> have ApplicationMaster for each kind of application.****
>>>>
>>>> ** **
>>>>
>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>
>>>> These configurations still work for MR Job in Yarn.****
>>>>
>>>>
>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>> a job?****
>>>>
>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>> which includes getting the containers for maps & reducers, launch the
>>>> containers using NM, tacks the tasks status till completion, manage the
>>>> failed tasks.****
>>>>
>>>>
>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?****
>>>>
>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>> for resources to complete the tasks for that application.****
>>>>
>>>>
>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?****
>>>>
>>>> ApplicationMaster is responsible for getting the containers from RM by