Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Running Back to Back Map-reduce jobs


Copy link to this message
-
Re: Running Back to Back Map-reduce jobs
madhu phatak 2011-06-21, 11:14
You can use ControlledJob's addDependingJob to handle dependency between
multiple jobs.

On Tue, Jun 7, 2011 at 4:15 PM, Adarsh Sharma <[EMAIL PROTECTED]>wrote:

> Harsh J wrote:
>
>> Yes, I believe Oozie does have Pipes and Streaming action helpers as well.
>>
>> On Thu, Jun 2, 2011 at 5:05 PM, Adarsh Sharma <[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>> Ok, Is it valid for running jobs through Hadoop Pipes too.
>>>
>>> Thanks
>>>
>>> Harsh J wrote:
>>>
>>>
>>>> Oozie's workflow feature may exactly be what you're looking for. It
>>>> can also do much more than just chain jobs.
>>>>
>>>> Check out additional features at: http://yahoo.github.com/oozie/
>>>>
>>>> On Thu, Jun 2, 2011 at 4:48 PM, Adarsh Sharma <[EMAIL PROTECTED]
>>>> >
>>>> wrote:
>>>>
>>>>
>>>>
>>> After following the below points, I am confused about the examples used
> in the documentation :
>
> http://yahoo.github.com/oozie/**releases/3.0.0/**
> WorkflowFunctionalSpec.html#**a3.2.2.3_Pipes<http://yahoo.github.com/oozie/releases/3.0.0/WorkflowFunctionalSpec.html#a3.2.2.3_Pipes>
>
> What I want to achieve is to terminate a job after my permission i.e if I
> want to run again a map-reduce job after the completion of one , it runs &
> then terminates after my code execution.
> I struggled to find a simple example that proves this concept. In the Oozie
> documentation, they r just setting parameters and use them.
>
> fore.g a simple Hadoop Pipes job is executed by :
>
> int main(int argc, char *argv[]) {
>  return HadoopPipes::runTask(**HadoopPipes::TemplateFactory<**
> WordCountMap,
>                             WordCountReduce>());
> }
>
> Now if I want to run another job after this on the reduced data in HDFS,
> how this could be possible. Do i need to add some code.
>
> Thanks
>
>
>
>
>
>  Dear all,
>>>>>
>>>>> I ran several map-reduce jobs in Hadoop Cluster of 4 nodes.
>>>>>
>>>>> Now this time I want a map-reduce job to be run again after one.
>>>>>
>>>>> Fore.g to clear my point, suppose a wordcount is run on gutenberg file
>>>>> in
>>>>> HDFS and after completion
>>>>>
>>>>> 11/06/02 15:14:35 WARN mapred.JobClient: No job jar file set.  User
>>>>> classes
>>>>> may not be found. See JobConf(Class) or JobConf#setJar(String).
>>>>> 11/06/02 15:14:35 INFO mapred.FileInputFormat: Total input paths to
>>>>> process
>>>>> : 3
>>>>> 11/06/02 15:14:36 INFO mapred.JobClient: Running job:
>>>>> job_201106021143_0030
>>>>> 11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
>>>>> 11/06/02 15:14:59 INFO mapred.JobClient:  map 66% reduce 11%
>>>>> 11/06/02 15:15:08 INFO mapred.JobClient:  map 100% reduce 22%
>>>>> 11/06/02 15:15:17 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 11/06/02 15:15:25 INFO mapred.JobClient: Job complete:
>>>>> job_201106021143_0030
>>>>> 11/06/02 15:15:25 INFO mapred.JobClient: Counters: 18
>>>>>
>>>>>
>>>>>
>>>>> Again a map-reduce job is started on the output or original data say
>>>>> again
>>>>>
>>>>> 1/06/02 15:14:36 INFO mapred.JobClient: Running job:
>>>>> job_201106021143_0030
>>>>> 11/06/02 15:14:37 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 11/06/02 15:14:50 INFO mapred.JobClient:  map 33% reduce 0%
>>>>>
>>>>> Is it possible or any parameters to achieve it.
>>>>>
>>>>> Please guide .
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>