Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - How do I set the intermediate output path when I use 2 mapreduce jobs?


Copy link to this message
-
Re: How do I set the intermediate output path when I use 2 mapreduce jobs?
Swathi V 2011-09-21, 16:56
Hi,

This code might help you
//JobDependancies.java snippet

Configuration conf = new Configuration();
   Job job1 = new Job(conf, "job1");
   job1.setJarByClass(JobDependancies.class);
   job1.setMapperClass(WordMapper.class);
   job1.setReducerClass(WordReducer.class);
   job1.setOutputKeyClass(Text.class);
   job1.setOutputValueClass(IntWritable.class);
   FileInputFormat.addInputPath(job1, new Path(args[0]));
   String out=args[1]+System.nanoTime();
   FileOutputFormat.setOutputPath(job1, new Path(out));

   Configuration conf2 = new Configuration();
   Job job2  = new Job(conf2, "job2");
   job2.setJarByClass(JobDependancies.class);
   job2.setOutputKeyClass(IntWritable.class);
   job2.setOutputValueClass(Text.class);
   job2.setMapperClass(SortWordMapper.class);
   job2.setReducerClass(Reducer.class);
   FileInputFormat.addInputPath(job2, new Path(out+"/part-r-00000"));
   FileOutputFormat.setOutputPath(job2, new Path(args[1]));

   ControlledJob controlledJob1 = new
ControlledJob(job1.getConfiguration());
   ControlledJob controlledJob2 = new
ControlledJob(job2.getConfiguration());
   controlledJob2.addDependingJob(controlledJob1);
   JobControl jobControl= new JobControl("control");

   jobControl.addJob(controlledJob1);
   jobControl.addJob(controlledJob2);

   Thread thread = new Thread(jobControl);
   thread.start();
   while(!jobControl.allFinished())
   {
    try {
    Thread.sleep(10000);
    } catch (InterruptedException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
   }
   jobControl.stop();
   }
}
wordcount output => job1 is given to sort=> job2
Irrespective of mappers and reducers, above mentioned is the way to handle
many jobs.

2011/9/21 谭军 <[EMAIL PROTECTED]>

> Hi,
> I want to use 2 MR jobs sequentially.
> And the first job produces intermediate result to a temp file.
> The second job reads the result in temp file but not the FileInputPath.
> I tried, but FileNotFoundException reported.
> Then I checked the datanodes, temp file was created.
> The first job was executed correctly.
> Why the second job cannot find the file? The file was created before the
> second job was executed.
> Thanks!
>
> --
>
> Regards!
>
> Jun Tan
>
>
>
--
Regards,
Swathi.V.