|
|
-
Re: Running Back to Back Map-reduce jobsmadhu phatak 2011-06-21, 11:14
You can use ControlledJob's addDependingJob to handle dependency between
multiple jobs. On Tue, Jun 7, 2011 at 4:15 PM, Adarsh Sharma <[EMAIL PROTECTED]>wrote: > Harsh J wrote: > >> Yes, I believe Oozie does have Pipes and Streaming action helpers as well. >> >> On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma <[EMAIL PROTECTED]> >> wrote: >> >> >>> Ok, Is it valid for running jobs through Hadoop Pipes too. >>> >>> Thanks >>> >>> Harsh J wrote: >>> >>> >>>> Oozie's workflow feature may exactly be what you're looking for. It >>>> can also do much more than just chain jobs. >>>> >>>> Check out additional features at: http://yahoo.github.com/oozie/ >>>> >>>> On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma <[EMAIL PROTECTED] >>>> > >>>> wrote: >>>> >>>> >>>> >>> After following the below points, I am confused about the examples used > in the documentation : > > http://yahoo.github.com/oozie/**releases/3.0.0/** > WorkflowFunctionalSpec.html#**a3.2.2.3_Pipes<http://yahoo.github.com/oozie/releases/3.0.0/WorkflowFunctionalSpec.html#a3.2.2.3_Pipes> > > What I want to achieve is to terminate a job after my permission i.e if I > want to run again a map-reduce job after the completion of one , it runs & > then terminates after my code execution. > I struggled to find a simple example that proves this concept. In the Oozie > documentation, they r just setting parameters and use them. > > fore.g a simple Hadoop Pipes job is executed by : > > int main(int argc, char *argv[]) { > return HadoopPipes::runTask(**HadoopPipes::TemplateFactory<** > WordCountMap, > WordCountReduce>()); > } > > Now if I want to run another job after this on the reduced data in HDFS, > how this could be possible. Do i need to add some code. > > Thanks > > > > > > Dear all, >>>>> >>>>> I ran several map-reduce jobs in Hadoop Cluster of 4 nodes. >>>>> >>>>> Now this time I want a map-reduce job to be run again after one. >>>>> >>>>> Fore.g to clear my point, suppose a wordcount is run on gutenberg file >>>>> in >>>>> HDFS and after completion >>>>> >>>>> 11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set. User >>>>> classes >>>>> may not be found. See JobConf(Class) or JobConf#setJar(String). >>>>> 11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to >>>>> process >>>>> : 3 >>>>> 11/06/02 15:14:36 INFO mapred.JobClient: Running job: >>>>> job_201106021143_0030 >>>>> 11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0% >>>>> 11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0% >>>>> 11/06/02 15:14:59 INFO mapred.JobClient: map 66% reduce 11% >>>>> 11/06/02 15:15:08 INFO mapred.JobClient: map 100% reduce 22% >>>>> 11/06/02 15:15:17 INFO mapred.JobClient: map 100% reduce 100% >>>>> 11/06/02 15:15:25 INFO mapred.JobClient: Job complete: >>>>> job_201106021143_0030 >>>>> 11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18 >>>>> >>>>> >>>>> >>>>> Again a map-reduce job is started on the output or original data say >>>>> again >>>>> >>>>> 1/06/02 15:14:36 INFO mapred.JobClient: Running job: >>>>> job_201106021143_0030 >>>>> 11/06/02 15:14:37 INFO mapred.JobClient: map 0% reduce 0% >>>>> 11/06/02 15:14:50 INFO mapred.JobClient: map 33% reduce 0% >>>>> >>>>> Is it possible or any parameters to achieve it. >>>>> >>>>> Please guide . >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >> >> >> > > |