I would definitely checkout Oozie for this use case.
On Thu, Sep 29, 2011 at 12:51 PM, Aaron Baff <[EMAIL PROTECTED]> wrote:
> I saw this, but wasn't sure if it was something that ran on the client and just submitted the Job's in sequence, or if that gave it all to the JobTracker, and the JobTracker took care of submitting the Jobs in sequence appropriately.
> Basically, I'm looking for a completely stateless client, that doesn't need to ping the JobTracker every now and then to see if a Job has completed, and then submit the next one. The ideal flow would be the client gets in a request to run the series of Jobs, it preps them all, gets them all configured, and then passes them off to the JobTracker which runs them all in order without the client application needing to do anthing further.
> Sounds like that doesn't really exist as part of Hadoop framework, and needs something like Oozie (or a home-built system) to do this.
> -----Original Message-----
> From: Harsh J [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 28, 2011 9:37 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Running multiple MR Job's in sequence
> Within the Hadoop core project, there is JobControl you can utilize
> for this. You can view its API at
> and it is fairly simple to use (Create jobs in regular java API, build
> a dependency flow using JobControl atop these jobconf objects).
> Apache Oozie and other such tools offer higher abstractions on
> controlling a workflow, and can be considered when your needs can get
> a bit complex than just a series (easy to handle failure scenarios
> between dependent jobs, perform minor fs operations in pre/post
> processing, etc.).
> On Thu, Sep 29, 2011 at 5:26 AM, Aaron Baff <[EMAIL PROTECTED]> wrote:
>> Is it possible to submit a series of MR Jobs to the JobTracker to run in sequence (one finishes, take the output of that if successful and feed it into the next, etc), or does it need to run client side by using the JobControl or something like Oozie, or rolling our own? What I'm looking for is a fire & forget, and occasionally check back to see if it's done. So client-side doesn't need to really know anything or keep track of anything. Does something like that exist within the Hadoop framework?
> Harsh J