Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - execute hadoop job from remote web application


Copy link to this message
-
Re: execute hadoop job from remote web application
Oleg Ruchovets 2011-10-18, 16:50
So you mean that in case I am going to submit job remotely and
my_hadoop_job.jar
will be in class path of my web application it will submit job with
my_hadoop_job.jar to
remote hadoop machine (cluster)?

On Tue, Oct 18, 2011 at 6:13 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Oleg,
>
> Steve already covered this.
>
> The "hadoop jar" subcommand merely runs the jar program for you, as a
> utility - it has nothing to do with submissions really.
>
> Have you tried submitting your program by running your jar as a
> regular java program (java -jar <jar>), with the proper classpath?
> (You may use "hadoop classpath" to get a string.).
>
> It would go through fine, and submit the job jar with classes
> included, over to the JobTracker.
>
> On Tue, Oct 18, 2011 at 9:13 PM, Oleg Ruchovets <[EMAIL PROTECTED]>
> wrote:
> > I  try to be more specific. It is not dependent jar. It is a jar which
> > contains map/reduce/combine classes and some business logic.
> >  executing our job from command line, class which parse parameters and
> > submit a job has a line of code:
> >    job.setJarByClass(HadoopJobExecutor.class);
> >
> > we execute it locally on hadoop master machine using command such
> command:
> > opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> > -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
> >
> > and of course my_hadoop_job.jar  is found because it is located on the
> same
> > machine.
> >
> > Now , suppose I am going to submit job remotely (from web applications).
> >  and I have the same line of code
> > job.setJarByClass(HadoopJobExecutor.class);
> >
> >  In case my_hadoop_job.jar located on remote hadoop machine  (in class
> path)
> > , my jobClient will failed because there is no job jar in class path ( it
> is
> > located on remote hadoop machine). Am I write? I simply don't know how to
> > submit a job remotely (in my case job is not a map/combine/reduce classes
> it
> > is a jar which contains other classes too).
> >
> > Regarding remotely invoke the shellscript that contains the hadoop jar
> > command with
> > any required input arguments.
> >    It is possible to do it  by Runtime.getRuntime().exec(
> > submitCommand.toString().split( " " ) );
> > But I prefer to use jobClient , because I can monitor my job (get
> counters
> > and other useful information).
> >
> > Thanks in advance
> > Oleg.
> >
> > On Tue, Oct 18, 2011 at 4:34 PM, Bejoy KS <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi Oleg
> >>          I haven't tried out a scenario like you mentioned. But I think
> >> there shouldn't be any issue in submitting a job that has some dependent
> >> classes which holds the business logic referred from mapper,reducer or
> >> combiner. You should be able to do the job submission remotely the same
> we
> >> were discussing in this thread. If you need to distribute any dependent
> >> jars/files along with the application jar, you can use the -libjars
> option
> >> in CLI or use the DistributedCache methods like
> >> addArchiveToClassPath()/addFileToClassPath() in your java code. If it is
> a
> >> dependent jar It is better to deploy the same in the cluster environment
> >> itself so that every time when you submit your job you don't have to
> >> transfer the jar over the network again and again.
> >>         Just a suggestion, if you can execute the job from within your
> >> hadoop cluster you don't have to do a remote job submission. You just
> need
> >> to remotely invoke the shellscript that contains the hadoop jar command
> >> with
> >> any required input arguments. Sorry if I'm not getting your requirement
> >> exactly.
> >>
> >> Regards
> >> Bejoy.K.S
> >>
> >> On Tue, Oct 18, 2011 at 6:29 PM, Oleg Ruchovets <[EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > Thanks  you all for your answers but I still have a questions:
> >> >  Currently we running our jobs using shell scripts which locates on
> >> hadoop
> >> > master machine.
> >> >
> >> > Here is an example of command line:
> >>