Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> RE: How Yarn execute MRv1 job?


Copy link to this message
-
Re: How Yarn execute MRv1 job?
Hi Sam,
please look at :http://hbase.apache.org/book.html#d2617e499

generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha.
and Hive-0.10 supports hadoop-2.x very well.

On Thu, Jun 20, 2013 at 2:11 PM, sam liu <[EMAIL PROTECTED]> wrote:

> Thanks Arun!
>
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
> directly, without recompiling
>
> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
> and only some special versions of them can run against YARN? If yes, how
> can I get the versions for YARN?
>
>
> 2013/6/20 Arun C Murthy <[EMAIL PROTECTED]>
>
>>
>> On Jun 19, 2013, at 6:45 PM, sam liu <[EMAIL PROTECTED]> wrote:
>>
>> Appreciating for the detailed answers! Here are three further questions:
>>
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
>> If yarn does not ask existing MRv1 job to do any code change, but why we
>> should recompile the MRv1 job?
>>
>>
>> You don't need to recompile MRv1 jobs to run against YARN.
>>
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>> related components again with yarn jar files? Without any code change?
>>
>>
>> You will need versions of HBase, Hive etc. which are integrated with
>> hadoop-2.x, but not need to change any of your end-user applications (MR
>> jobs, hive queries, pig scripts etc.)
>>
>> Arun
>>
>>
>> Thanks in advance!
>>
>>
>>
>> 2013/6/19 Rahul Bhattacharjee <[EMAIL PROTECTED]>
>>
>>> Thanks Arun and Devraj , good to know.
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <[EMAIL PROTECTED]>wrote:
>>>
>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>
>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>> Hi Devaraj,
>>>>
>>>> As for the container request request for yarn container , currently
>>>> only memory is considered as resource , not cpu. Please correct.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <[EMAIL PROTECTED]>wrote:
>>>>
>>>>>  Hi Sam,****
>>>>>
>>>>>   Please find the answers for your queries. ****
>>>>>
>>>>>
>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>
>>>>> ** **
>>>>>
>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>> application). If we want to run different kinds of applications we should
>>>>> have ApplicationMaster for each kind of application.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>
>>>>> These configurations still work for MR Job in Yarn.****
>>>>>
>>>>>
>>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?****
>>>>>
>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>> which includes getting the containers for maps & reducers, launch the
>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>> failed tasks.****
>>>>>
>>>>>
>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?****
>>>>>
>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>> based on the resources(memory, cpu). Application Master can request the RM