-Re: Abstraction layer to support both YARN and Mesos
Vinod Kumar Vavilapalli 2013-07-31, 16:45
What I thought was the original proposal was to use the existing MR client+AM+task code to run on top of Mesos. And like Steve mentioned, today all of it is very tightly couple with YARN APIs. Using JobClient against a Mesos implementation of MapReduce is easy, changing AM to start getting containers from Mesos and launching via Mesos needs more abstractions. And at this point of time, again as Steve laid it out clearly, the focus of MapReduce project is on stabilizing and shipping together with YARN.
That said, working on thinking about those abstractions inside MR AM is a step forward IF there is enough interest around this. I see a couple of people already showing enthusiasm, but it'll be great to see more interest. May be a few from Mesos community who understand what those abstractions should look like.
The last thing we want is create unnecessary abstractions now that may never get used in the future.
On Jul 31, 2013, at 9:34 AM, Bikas Saha wrote:
> +1 for Tom's suggestion. That is how we have transparently redirected MR
> jobs to use Tez as the execution framework.
> -----Original Message-----
> From: Tom White [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, July 31, 2013 8:41 AM
> To: mapreduce-dev
> Cc: [EMAIL PROTECTED]
> Subject: Re: Abstraction layer to support both YARN and Mesos
> I can see value in this, since it would allow MR programs and libraries to
> run on either YARN or Mesos with no recompilation. The value here is
> really in the libraries since it means library maintainers don't have to
> maintain two versions of their library.
> Note that there is no extra level of indirection required - it's already
> there in org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider -
> which is used to switch between submitting jobs to the JobTracker and
> submitting to YARN's RM. A MesosClientProtocolProvider might be hosted in
> Mesos - perhaps Mesos developers are already working on this?
> On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <[EMAIL PROTECTED]>
>> On 26 July 2013 07:13, Tsuyoshi OZAWA <[EMAIL PROTECTED]> wrote:
>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>> MRv2, because there are their own field users of MapReduce would like
>>> to use. I think this work is useful for MapReduce users. On the other
>>> hand, this work can also be large, because MRv2's code base is
>>> tightly coupled with YARN currently. Thoughts?
>> MRv2 is too intimately involved with Hadoop for it to easily be moved,
>> have a look at the mapreduce package code base to see this. We are
>> also developing and currently releasing them in sync.
>> Yes, an extra layer of indirection may appear to get MR to work on
>> Mesos -but things like locality, ongoing dev YARN APIs &c and the
>> release schedule would push for MRv2 to focus on YARN: data aware job
>> (and service) scheduling in Hadoop clusters.
>> As an example of how those layers of indirection cause problems, look
>> at commons-logging. Ubiquitous as the API in front of Log4J, when
>> using raw Log4J would have been better (look in the hadoop tests code
>> where the underlying logger is explicitly extracted and tuned for