Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> RE: How Yarn execute MRv1 job?


Copy link to this message
-
Re: How Yarn execute MRv1 job?
Hi Azurry,

So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
does not support hadoop 2.x, right?

Thanks!
2013/6/20 Azuryy Yu <[EMAIL PROTECTED]>

> Hi Sam,
> please look at :http://hbase.apache.org/book.html#d2617e499
>
> generally, we said YARN is Hadoop-2.x, you can download
> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>
>
>
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <[EMAIL PROTECTED]> wrote:
>
>> Thanks Arun!
>>
>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>> directly, without recompiling
>>
>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>> and only some special versions of them can run against YARN? If yes, how
>> can I get the versions for YARN?
>>
>>
>> 2013/6/20 Arun C Murthy <[EMAIL PROTECTED]>
>>
>>>
>>> On Jun 19, 2013, at 6:45 PM, sam liu <[EMAIL PROTECTED]> wrote:
>>>
>>> Appreciating for the detailed answers! Here are three further questions:
>>>
>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>> we should recompile the MRv1 job?
>>>
>>>
>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>
>>> - Which yarn jar files are required in the recompiling?
>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>> related components again with yarn jar files? Without any code change?
>>>
>>>
>>> You will need versions of HBase, Hive etc. which are integrated with
>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>> jobs, hive queries, pig scripts etc.)
>>>
>>> Arun
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>> 2013/6/19 Rahul Bhattacharjee <[EMAIL PROTECTED]>
>>>
>>>> Thanks Arun and Devraj , good to know.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>
>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>> Hi Devaraj,
>>>>>
>>>>> As for the container request request for yarn container , currently
>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>>  Hi Sam,****
>>>>>>
>>>>>>   Please find the answers for your queries. ****
>>>>>>
>>>>>>
>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>> application). If we want to run different kinds of applications we should
>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>
>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>
>>>>>>
>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>> execute a job?****
>>>>>>
>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>> failed tasks.****
>>>>>>
>>>>>>
>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and