I can see value in this, since it would allow MR programs and
libraries to run on either YARN or Mesos with no recompilation. The
value here is really in the libraries since it means library
maintainers don't have to maintain two versions of their library.
Note that there is no extra level of indirection required - it's
already there in
org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider - which is
used to switch between submitting jobs to the JobTracker and
submitting to YARN's RM. A MesosClientProtocolProvider might be hosted
in Mesos - perhaps Mesos developers are already working on this?
On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 26 July 2013 07:13, Tsuyoshi OZAWA <[EMAIL PROTECTED]> wrote:
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
> MRv2 is too intimately involved with Hadoop for it to easily be moved, have
> a look at the mapreduce package code base to see this. We are also
> developing and currently releasing them in sync.
> Yes, an extra layer of indirection may appear to get MR to work on Mesos
> -but things like locality, ongoing dev YARN APIs &c and the release
> schedule would push for MRv2 to focus on YARN: data aware job (and service)
> scheduling in Hadoop clusters.
> As an example of how those layers of indirection cause problems, look at
> commons-logging. Ubiquitous as the API in front of Log4J, when using raw
> Log4J would have been better (look in the hadoop tests code where the
> underlying logger is explicitly extracted and tuned for examples).