On 26 July 2013 07:13, Tsuyoshi OZAWA <[EMAIL PROTECTED]> wrote:
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
MRv2 is too intimately involved with Hadoop for it to easily be moved, have
a look at the mapreduce package code base to see this. We are also
developing and currently releasing them in sync.
Yes, an extra layer of indirection may appear to get MR to work on Mesos
-but things like locality, ongoing dev YARN APIs &c and the release
schedule would push for MRv2 to focus on YARN: data aware job (and service)
scheduling in Hadoop clusters.
As an example of how those layers of indirection cause problems, look at
commons-logging. Ubiquitous as the API in front of Log4J, when using raw
Log4J would have been better (look in the hadoop tests code where the
underlying logger is explicitly extracted and tuned for examples).