Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> help in distribution of a task with hadoop


Copy link to this message
-
Re: help in distribution of a task with hadoop
1) A standard way of doing it would be to have all your files content
inside HDFS. You could then process <key,value> where key would be the name
of the file and value its contents. It would improve performance : data
locality, less network traffic... But you may have constraints...

2) Maven is a simple way of doing it.

Regards

Bertrand

On Mon, Aug 13, 2012 at 7:59 PM, Pierre Antoine DuBoDeNa
<[EMAIL PROTECTED]>wrote:

> Hello,
>
> We use hadoop to distribute a task over our machines.
>
> This task requires only the mapper class to be defined. We want to do some
> text processing in thousands of documents. So we create key-value pairs,
> where key is just an increasing number and value is the path of the file to
> be processed.
>
> We face problem on including an external jar file/class while running a jar
> file.
>
> $ mkdir Rdg_classes
>  $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d
> Rdg_classes Rdg.java
> $ jar -cvf Rdg.jar -C Rdg_classes/ .
> We have tried the following options:
>
> *1. Set HADOOP_CLASSPATH with the location of external jar files or
> external classes.*
> It doesnt help. Instead, it starts de-recognizing the Reducer with below
> error:
>
> java.lang.RuntimeException: java.lang.RuntimeException:
> java.lang.ClassNotFoundException: hadoop.Rdg$Reduce
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:899)
>     at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1028)
>     at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1380)
>     at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:981)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> hadoop.Rdg$Reduce
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:891)
>     ... 10 more
> Caused by: java.lang.ClassNotFoundException: hadoop.Rdg$Reduce
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:247)
>     at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>     at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865)
>     ... 11 more
>
> *2. Use -libjars option as below:*
> hadoop jar Rdg.jar my.hadoop.Rdg -libjars Rdg_lib/* tester rdg_output
>
> Where Rdg_lib is the a folder containing all reqd classes/jars stored on
> HDFS.
> But it starts reading -libjars as an input as gives error as:
>
> 12/08/10 08:16:24 ERROR security.UserGroupInformation:
> PriviledgedActionException as:hduser
> cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not
> exist: hdfs://nameofserver:54310/user/hduser/-libjars
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist: hdfs://nameofserver:54310/user/hduser/-libjars
>
> Is there any other way to do it? or we do anything wrong?
>
> Best,
>

--
Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB