Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> execute hadoop job from remote web application


Copy link to this message
-
Re: execute hadoop job from remote web application
So you mean that in case I am going to submit job remotely and
my_hadoop_job.jar
will be in class path of my web application it will submit job with
my_hadoop_job.jar to
remote hadoop machine (cluster)?

On Tue, Oct 18, 2011 at 6:13 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Oleg,
>
> Steve already covered this.
>
> The "hadoop jar" subcommand merely runs the jar program for you, as a
> utility - it has nothing to do with submissions really.
>
> Have you tried submitting your program by running your jar as a
> regular java program (java -jar <jar>), with the proper classpath?
> (You may use "hadoop classpath" to get a string.).
>
> It would go through fine, and submit the job jar with classes
> included, over to the JobTracker.
>
> On Tue, Oct 18, 2011 at 9:13 PM, Oleg Ruchovets <[EMAIL PROTECTED]>
> wrote:
> > I  try to be more specific. It is not dependent jar. It is a jar which
> > contains map/reduce/combine classes and some business logic.
> >  executing our job from command line, class which parse parameters and
> > submit a job has a line of code:
> >    job.setJarByClass(HadoopJobExecutor.class);
> >
> > we execute it locally on hadoop master machine using command such
> command:
> > opt/hadoop/bin/hadoop jar /opt/hadoop/hadoop-jobs/my_hadoop_job.jar
> > -inputPath /opt/inputs/  -outputPath /data/output_jobs/output
> >
> > and of course my_hadoop_job.jar  is found because it is located on the
> same
> > machine.
> >
> > Now , suppose I am going to submit job remotely (from web applications).
> >  and I have the same line of code
> > job.setJarByClass(HadoopJobExecutor.class);
> >
> >  In case my_hadoop_job.jar located on remote hadoop machine  (in class
> path)
> > , my jobClient will failed because there is no job jar in class path ( it
> is
> > located on remote hadoop machine). Am I write? I simply don't know how to
> > submit a job remotely (in my case job is not a map/combine/reduce classes
> it
> > is a jar which contains other classes too).
> >
> > Regarding remotely invoke the shellscript that contains the hadoop jar
> > command with
> > any required input arguments.
> >    It is possible to do it  by Runtime.getRuntime().exec(
> > submitCommand.toString().split( " " ) );
> > But I prefer to use jobClient , because I can monitor my job (get
> counters
> > and other useful information).
> >
> > Thanks in advance
> > Oleg.
> >
> > On Tue, Oct 18, 2011 at 4:34 PM, Bejoy KS <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi Oleg
> >>          I haven't tried out a scenario like you mentioned. But I think
> >> there shouldn't be any issue in submitting a job that has some dependent
> >> classes which holds the business logic referred from mapper,reducer or
> >> combiner. You should be able to do the job submission remotely the same
> we
> >> were discussing in this thread. If you need to distribute any dependent
> >> jars/files along with the application jar, you can use the -libjars
> option
> >> in CLI or use the DistributedCache methods like
> >> addArchiveToClassPath()/addFileToClassPath() in your java code. If it is
> a
> >> dependent jar It is better to deploy the same in the cluster environment
> >> itself so that every time when you submit your job you don't have to
> >> transfer the jar over the network again and again.
> >>         Just a suggestion, if you can execute the job from within your
> >> hadoop cluster you don't have to do a remote job submission. You just
> need
> >> to remotely invoke the shellscript that contains the hadoop jar command
> >> with
> >> any required input arguments. Sorry if I'm not getting your requirement
> >> exactly.
> >>
> >> Regards
> >> Bejoy.K.S
> >>
> >> On Tue, Oct 18, 2011 at 6:29 PM, Oleg Ruchovets <[EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > Thanks  you all for your answers but I still have a questions:
> >> >  Currently we running our jobs using shell scripts which locates on
> >> hadoop
> >> > master machine.
> >> >
> >> > Here is an example of command line:
> >>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB