-MRv1 JT Availability (was [DISCUSS] Spin out MR, HDFS and YARN ...)
Andrew Purtell 2012-09-09, 17:57
On Mon, Sep 3, 2012 at 4:02 AM Arun C Murthy wrote:
> > On Sep 1, 2012, at 6:32 AM, Andrew Purtell wrote:
> > I'd imagine such a MR(v1) in Hadoop, if this happened, would concentrate on
> > performance improvements, maybe such things as alternate shuffle plugins.
> > Perhaps a HA JobTracker for parity with HDFS.
> Lots of this has already happened in branch-1, please look at:
> # JT Availability: MAPREDUCE-3837, MAPREDUCE-4328, MAPREDUCE-4603 (WIP)
Thanks for the pointers!
I just want to be more clear in what I meant by "HA JobTracker for
parity with HDFS". There should be no need to quiesce the JT with a
highly available NameNode, and restarting jobs from the beginning if
the JT crashes isn't good enough to meet the user expectations implied
by "high availability", at least those who are our internal customers.
I meant hot JT failover, that there is a primary and backup JT, that
they share state sufficient for the backup to take over immediately if
the primary fails, and that the TTs and JobClients both will switch
seamlessly to the backup should their communications with the primary
fail. I'd expect state sharing to limit scalability to the small- and
medium-cluster range, and that's fine, YARN is the answer for
scalability issues in the large and largest clusters already.
> # Performance - backports of PureJavaCrc32 in spills (MAPREDUCE-782), fadvise backports (MAPREDUCE-3289) and other several misc. fixes.
Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)