+1 for Tom's suggestion. That is how we have transparently redirected MR
jobs to use Tez as the execution framework.
From: Tom White [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 31, 2013 8:41 AM
Cc: [EMAIL PROTECTED]
Subject: Re: Abstraction layer to support both YARN and Mesos
I can see value in this, since it would allow MR programs and libraries to
run on either YARN or Mesos with no recompilation. The value here is
really in the libraries since it means library maintainers don't have to
maintain two versions of their library.
Note that there is no extra level of indirection required - it's already
there in org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider -
which is used to switch between submitting jobs to the JobTracker and
submitting to YARN's RM. A MesosClientProtocolProvider might be hosted in
Mesos - perhaps Mesos developers are already working on this?
On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <[EMAIL PROTECTED]>
> On 26 July 2013 07:13, Tsuyoshi OZAWA <[EMAIL PROTECTED]> wrote:
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is
>> tightly coupled with YARN currently. Thoughts?
> MRv2 is too intimately involved with Hadoop for it to easily be moved,
> have a look at the mapreduce package code base to see this. We are
> also developing and currently releasing them in sync.
> Yes, an extra layer of indirection may appear to get MR to work on
> Mesos -but things like locality, ongoing dev YARN APIs &c and the
> release schedule would push for MRv2 to focus on YARN: data aware job
> (and service) scheduling in Hadoop clusters.
> As an example of how those layers of indirection cause problems, look
> at commons-logging. Ubiquitous as the API in front of Log4J, when
> using raw Log4J would have been better (look in the hadoop tests code
> where the underlying logger is explicitly extracted and tuned for