+ Adding on to Joey's comments
If you want to eliminate the process of distributing the dependent
jars every time, then you need to manually pre-distribute these jars across
the nodes and add them on to the classpath of all nodes. This approach may
be chosen if you are periodically running some job at a greater frequency
on your cluster that needs external jars.
On Tue, Mar 6, 2012 at 9:23 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> If you're using -libjars, there's no reason to copy the jars into
> $HADOOP lib. You may have to add the jars to the HADOOP_CLASSPATH if
> you use them from your main() method:
> export HADOOP_CLASSPATH=dependent-1.jar,dependent-2.jar
> hadoop jar main.jar demo.MyJob -libjars
> dependent-1.jar,dependent-2.jar -Dmapred.input.dir=/input/path
> On Tue, Mar 6, 2012 at 10:37 AM, Jane Wayne <[EMAIL PROTECTED]>
> > currently, i have my main jar and then 2 depedent jars. what i do is
> > 1. copy dependent-1.jar to $HADOOP/lib
> > 2. copy dependent-2.jar to $HADOOP/lib
> > then, when i need to run my job, MyJob inside main.jar, i do the
> > hadoop jar main.jar demo.MyJob -libjars dependent-1.jar,dependent-2.jar
> > -Dmapred.input.dir=/input/path -Dmapred.output.dir=/output/path
> > what i want to do is NOT copy the dependent jars to $HADOOP/lib and
> > specify -libjars. is there any way around this multi-step procedure? i
> > really do not want to clutter $HADOOP/lib or specify a comma-delimited
> > of jars for -libjars.
> > any help is appreciated.
> Joseph Echeverria
> Cloudera, Inc.