Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How do I set the intermediate output path when I use 2 mapreduce jobs?


Copy link to this message
-
Re: Re: How do I set the intermediate output path when I use 2 mapreduce jobs?
Hi Jun Tan,

Yes i use 0.21.0 version. So i have used those. Well the Hadoop Definitive
Guide has job dependency examples for 0.20.x.

Thank You,

2011/9/23 谭军 <[EMAIL PROTECTED]>

> Swathi.V.,
> ControlledJob cannot be resolved in my eclipse.
> My hadoop version is 0.20.2
> ControlledJob can only be resolved in hadoop 0.21.0 (+)?
> Or I need some certain plugins?
> Thanks
>
> --
>
> Regards!
>
> Jun Tan
>
> At 2011-09-22 00:56:54,"Swathi V" <[EMAIL PROTECTED]> wrote:
>
>
> Hi,
>
> This code might help you
> //JobDependancies.java snippet
>
> Configuration conf = new Configuration();
>     Job job1 = new Job(conf, "job1");
>     job1.setJarByClass(JobDependancies.class);
>     job1.setMapperClass(WordMapper.class);
>     job1.setReducerClass(WordReducer.class);
>     job1.setOutputKeyClass(Text.class);
>     job1.setOutputValueClass(IntWritable.class);
>     FileInputFormat.addInputPath(job1, new Path(args[0]));
>     String out=args[1]+System.nanoTime();
>     FileOutputFormat.setOutputPath(job1, new Path(out));
>
>
>
>     Configuration conf2 = new Configuration();
>     Job job2  = new Job(conf2, "job2");
>     job2.setJarByClass(JobDependancies.class);
>     job2.setOutputKeyClass(IntWritable.class);
>     job2.setOutputValueClass(Text.class);
>     job2.setMapperClass(SortWordMapper.class);
>     job2.setReducerClass(Reducer.class);
>     FileInputFormat.addInputPath(job2, new Path(out+"/part-r-00000"));
>     FileOutputFormat.setOutputPath(job2, new Path(args[1]));
>
>     ControlledJob controlledJob1 = new
> ControlledJob(job1.getConfiguration());
>     ControlledJob controlledJob2 = new
> ControlledJob(job2.getConfiguration());
>     controlledJob2.addDependingJob(controlledJob1);
>     JobControl jobControl= new JobControl("control");
>
>     jobControl.addJob(controlledJob1);
>     jobControl.addJob(controlledJob2);
>
>     Thread thread = new Thread(jobControl);
>     thread.start();
>     while(!jobControl.allFinished())
>     {
>      try {
>      Thread.sleep(10000);
>      } catch (InterruptedException e) {
>      // TODO Auto-generated catch block
>      e.printStackTrace();
>      }
>     }
>     jobControl.stop();
>     }
> }
>
>
> wordcount output => job1 is given to sort=> job2
> Irrespective of mappers and reducers, above mentioned is the way to handle
> many jobs.
>
> 2011/9/21 谭军 <[EMAIL PROTECTED]>
>
>> Hi,
>> I want to use 2 MR jobs sequentially.
>> And the first job produces intermediate result to a temp file.
>> The second job reads the result in temp file but not the FileInputPath.
>> I tried, but FileNotFoundException reported.
>> Then I checked the datanodes, temp file was created.
>> The first job was executed correctly.
>> Why the second job cannot find the file? The file was created before the
>> second job was executed.
>> Thanks!
>>
>> --
>>
>> Regards!
>>
>> Jun Tan
>>
>>
>>
>
>
> --
> Regards,
> Swathi.V.
>
>
>
>
--
Regards,
Swathi.V.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB