The documentation is incorrect. The plan is to be able to support such a thing, but it has not been implemented yet. I would like to see it be part of the core map/reduce when it does happen because there are several different projects that could share this functionality like oozie, pig and hive. So in that case, when it does show up it is likely to a superset of the functionality supported by Oozie, minus the functionality that Arun mentioned, like triggering of jobs through data availability and on a regular time interval. Hopefully Oozie would eventually also move to use it. It would also allow such projects to potentially share DAG level optimizations, like reducing or even eliminating writing temporary output to HDFS in between small jobs similar to what spark does.
A DAGApplicationMaster would probably not be a DAG of generic applications it would probably be a DAG of mapreduce jobs with a few other things like what oozie supports in their DAG definitions. The reason for this is because for the DAG Application Master to truly be generic it would need to launch other Application Masters in separate containers where as if we limit it to just a subset of AMs we would not have to launch the separate processes, and we could provide the MR specific DAG level optimizations like I stated previously. We could still support launching of other AMs for completeness sake, but I see that as a lower priority.
On 5/17/12 9:29 PM, "Keith Wiley" <[EMAIL PROTECTED]> wrote:
On May 17, 2012, at 17:49 , Arun C Murthy wrote:
> Currently YARN doesn't offer anything to manage a DAG of applications.
Well, there is the following webpage:
which suggests that YARN supports a dag of MR jobs within a YARN application (second paragraph, last sentence). True, it is a dag of jobs within an application, not a dag of applications, but that wasn't really my original question. My question was how the dag structure offered by YARN differs from that offered by Oozie.
It doesn't seem like the responses to my question so far have adequately reconciled Oozie's dag of jobs with YARN's dag of jobs. To the contrary, the only response I've gotten so far seems to suggest that the webpage above is simply wrong and YARN offers no form of multi-job dag at all; no response in this thread has confirmed it for example.
> It's fairly easy to implement a DAGApplicationMaster to manage a set of applications (whether MR or others).
Right, but that applies to whole applications. Isn't a dag *of* jobs within an application rather analogous to what Oozie does? Bear in mind, that is the entire premise of my original question (the degree of similarity between these two multi-job dag coordination systems). The distinction between jobs and applications is only relevant after the relationship to Oozie has been established, since that was my original question.
I'm really sorry about the apparent misunderstanding. I didn't intend any confusion on the matter. I simply read the webpage at all and was immediately curious about its implications for Oozie, that's all.
> PS: Please use mapreduce-dev@ for technical discussions, general@ is used for project discussions/announcements. Thanks.
Oof, sorry about that. It's hard to move a thread mid-discussion of course since that messes up the archives and I still don't feel that the text on the webpage quoted above, which clearly describes YARN's dag of jobs, has been addressed, so I'm carrying on for the sake of "the historical record", but I apologize for not targeting my question at the most relevant mailing list. A mailing list named "general" struck me as, well, general, but I must have misinterpreted it.
Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
-- Edwin A. Abbott, Flatland