-Re: execute hadoop job from remote web application
Oleg Ruchovets 2011-10-18, 15:43
I try to be more specific. It is not dependent jar. It is a jar which
contains map/reduce/combine classes and some business logic.
executing our job from command line, class which parse parameters and
submit a job has a line of code:
we execute it locally on hadoop master machine using command such command:
opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
-inputPath /opt/inputs/ -outputPath /data/output_jobs/output
and of course my_hadoop_job.jar is found because it is located on the same
Now , suppose I am going to submit job remotely (from web applications).
and I have the same line of code
In case my_hadoop_job.jar located on remote hadoop machine (in class path)
, my jobClient will failed because there is no job jar in class path ( it is
located on remote hadoop machine). Am I write? I simply don't know how to
submit a job remotely (in my case job is not a map/combine/reduce classes it
is a jar which contains other classes too).
Regarding remotely invoke the shellscript that contains the hadoop jar
any required input arguments.
It is possible to do it by Runtime.getRuntime().exec(
submitCommand.toString().split( " " ) );
But I prefer to use jobClient , because I can monitor my job (get counters
and other useful information).
Thanks in advance
On Tue, Oct 18, 2011 at 4:34 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Oleg
> I haven't tried out a scenario like you mentioned. But I think
> there shouldn't be any issue in submitting a job that has some dependent
> classes which holds the business logic referred from mapper,reducer or
> combiner. You should be able to do the job submission remotely the same we
> were discussing in this thread. If you need to distribute any dependent
> jars/files along with the application jar, you can use the -libjars option
> in CLI or use the DistributedCache methods like
> addArchiveToClassPath()/addFileToClassPath() in your java code. If it is a
> dependent jar It is better to deploy the same in the cluster environment
> itself so that every time when you submit your job you don't have to
> transfer the jar over the network again and again.
> Just a suggestion, if you can execute the job from within your
> hadoop cluster you don't have to do a remote job submission. You just need
> to remotely invoke the shellscript that contains the hadoop jar command
> any required input arguments. Sorry if I'm not getting your requirement
> On Tue, Oct 18, 2011 at 6:29 PM, Oleg Ruchovets <[EMAIL PROTECTED]
> > Thanks you all for your answers but I still have a questions:
> > Currently we running our jobs using shell scripts which locates on
> > master machine.
> > Here is an example of command line:
> > /opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> > -inputPath /opt/inputs/ -outputPath /data/output_jobs/output
> > my_hadoop_job.jar has a class which parse input parameters and submit a
> > job.
> > Our code is very similar like you wrote:
> > ......
> > job.setJarByClass(HadoopJobExecutor.class);
> > job.setMapperClass(MultipleOutputMap.class);
> > job.setCombinerClass(BaseCombine.class);
> > job.setReducerClass(HBaseReducer.class);
> > job.setOutputKeyClass(Text.class);
> > job.setOutputValueClass(MapWritable.class);
> > FileOutputFormat.setOutputPath(job, new Path(finalOutPutPath));
> > jobCompleteStatus = job.waitForCompletion(true);
> > ...............
> > my question are:
> > 1) my_hadoop_job.jar contains another classes (business logic) not only
> > Map,Combine,Reduce classes and I still don't understand how can I submit
> > job
> > which needs all classes from my_hadoop_job.jar?
> > 2) Do I need to submit a my_hadoop_job.jar too? If yes what is the way to