|
Pierre Antoine DuBoDeNa
2012-08-13, 17:59
Bertrand Dechoux
2012-08-13, 18:22
Pierre Antoine DuBoDeNa
2012-08-13, 18:27
Bejoy Ks
2012-08-13, 18:29
Pierre Antoine DuBoDeNa
2012-08-13, 18:32
|
-
help in distribution of a task with hadoopPierre Antoine DuBoDeNa 2012-08-13, 17:59
Hello,
We use hadoop to distribute a task over our machines. This task requires only the mapper class to be defined. We want to do some text processing in thousands of documents. So we create key-value pairs, where key is just an increasing number and value is the path of the file to be processed. We face problem on including an external jar file/class while running a jar file. $ mkdir Rdg_classes $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d Rdg_classes Rdg.java $ jar -cvf Rdg.jar -C Rdg_classes/ . We have tried the following options: *1. Set HADOOP_CLASSPATH with the location of external jar files or external classes.* It doesnt help. Instead, it starts de-recognizing the Reducer with below error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: hadoop.Rdg$Reduce at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:899) at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1028) at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1380) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:981) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: hadoop.Rdg$Reduce at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:891) ... 10 more Caused by: java.lang.ClassNotFoundException: hadoop.Rdg$Reduce at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865) ... 11 more *2. Use -libjars option as below:* hadoop jar Rdg.jar my.hadoop.Rdg -libjars Rdg_lib/* tester rdg_output Where Rdg_lib is the a folder containing all reqd classes/jars stored on HDFS. But it starts reading -libjars as an input as gives error as: 12/08/10 08:16:24 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nameofserver:54310/user/hduser/-libjars Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nameofserver:54310/user/hduser/-libjars Is there any other way to do it? or we do anything wrong? Best,
-
Re: help in distribution of a task with hadoopBertrand Dechoux 2012-08-13, 18:22
1) A standard way of doing it would be to have all your files content
inside HDFS. You could then process <key,value> where key would be the name of the file and value its contents. It would improve performance : data locality, less network traffic... But you may have constraints... 2) Maven is a simple way of doing it. Regards Bertrand On Mon, Aug 13, 2012 at 7:59 PM, Pierre Antoine DuBoDeNa <[EMAIL PROTECTED]>wrote: > Hello, > > We use hadoop to distribute a task over our machines. > > This task requires only the mapper class to be defined. We want to do some > text processing in thousands of documents. So we create key-value pairs, > where key is just an increasing number and value is the path of the file to > be processed. > > We face problem on including an external jar file/class while running a jar > file. > > $ mkdir Rdg_classes > $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d > Rdg_classes Rdg.java > $ jar -cvf Rdg.jar -C Rdg_classes/ . > We have tried the following options: > > *1. Set HADOOP_CLASSPATH with the location of external jar files or > external classes.* > It doesnt help. Instead, it starts de-recognizing the Reducer with below > error: > > java.lang.RuntimeException: java.lang.RuntimeException: > java.lang.ClassNotFoundException: hadoop.Rdg$Reduce > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:899) > at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1028) > at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1380) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:981) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: > hadoop.Rdg$Reduce > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:891) > ... 10 more > Caused by: java.lang.ClassNotFoundException: hadoop.Rdg$Reduce > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:247) > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865) > ... 11 more > > *2. Use -libjars option as below:* > hadoop jar Rdg.jar my.hadoop.Rdg -libjars Rdg_lib/* tester rdg_output > > Where Rdg_lib is the a folder containing all reqd classes/jars stored on > HDFS. > But it starts reading -libjars as an input as gives error as: > > 12/08/10 08:16:24 ERROR security.UserGroupInformation: > PriviledgedActionException as:hduser > cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: hdfs://nameofserver:54310/user/hduser/-libjars > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: > Input path does not exist: hdfs://nameofserver:54310/user/hduser/-libjars > > Is there any other way to do it? or we do anything wrong? > > Best, > -- Bertrand Dechoux
-
Re: help in distribution of a task with hadoopPierre Antoine DuBoDeNa 2012-08-13, 18:27
We have all documents moved to HDFS. I understand with our 1st option we
need more I/O than what you say but let's say that's not a problem for now. Could you please point me on 2) option? how could we do that? any tutorial or example? Thanks 2012/8/13 Bertrand Dechoux <[EMAIL PROTECTED]> > 1) A standard way of doing it would be to have all your files content > inside HDFS. You could then process <key,value> where key would be the name > of the file and value its contents. It would improve performance : data > locality, less network traffic... But you may have constraints... > > 2) Maven is a simple way of doing it. > > Regards > > Bertrand > > On Mon, Aug 13, 2012 at 7:59 PM, Pierre Antoine DuBoDeNa > <[EMAIL PROTECTED]>wrote: > > > Hello, > > > > We use hadoop to distribute a task over our machines. > > > > This task requires only the mapper class to be defined. We want to do > some > > text processing in thousands of documents. So we create key-value pairs, > > where key is just an increasing number and value is the path of the file > to > > be processed. > > > > We face problem on including an external jar file/class while running a > jar > > file. > > > > $ mkdir Rdg_classes > > $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d > > Rdg_classes Rdg.java > > $ jar -cvf Rdg.jar -C Rdg_classes/ . > > We have tried the following options: > > > > *1. Set HADOOP_CLASSPATH with the location of external jar files or > > external classes.* > > It doesnt help. Instead, it starts de-recognizing the Reducer with below > > error: > > > > java.lang.RuntimeException: java.lang.RuntimeException: > > java.lang.ClassNotFoundException: hadoop.Rdg$Reduce > > at > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:899) > > at > org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1028) > > at > org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1380) > > at > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:981) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: > > hadoop.Rdg$Reduce > > at > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867) > > at > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:891) > > ... 10 more > > Caused by: java.lang.ClassNotFoundException: hadoop.Rdg$Reduce > > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:247) > > at > > > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) > > at > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865) > > ... 11 more > > > > *2. Use -libjars option as below:* > > hadoop jar Rdg.jar my.hadoop.Rdg -libjars Rdg_lib/* tester rdg_output > > > > Where Rdg_lib is the a folder containing all reqd classes/jars stored on > > HDFS. > > But it starts reading -libjars as an input as gives error as: > > > > 12/08/10 08:16:24 ERROR security.UserGroupInformation: > > PriviledgedActionException as:hduser > > cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not
-
Re: help in distribution of a task with hadoopBejoy Ks 2012-08-13, 18:29
Hi Bertrand
-libjars option works well with the 'hadoop jar' command. Instead of executing your runnable with the plain java 'jar' command use 'hadoop jar' . When you use hadoop jar you can ship the dependent jars/files etc as 1) include them in the /lib folder in your jar 2) use -libjars / -files to distribute jars or files Regards Bejoy KS
-
Re: help in distribution of a task with hadoopPierre Antoine DuBoDeNa 2012-08-13, 18:32
You mean like that:
hadoop jar Rdg.jar my.hadoop.Rdg -libjars Rdg_lib/* tester rdg_output Where Rdg_lib is the a folder containing all reqd classes/jars stored on HDFS. We get this error though. We do something wrong? 12/08/10 08:16:24 ERROR security.UserGroupInformation: PriviledgedActionException as:hduser cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nameofserver:54310/user/hduser/-libjars Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nameofserver:54310/user/hduser/-libjars 2012/8/13 Bejoy Ks <[EMAIL PROTECTED]> > Hi Bertrand > > -libjars option works well with the 'hadoop jar' command. Instead of > executing your runnable with the plain java 'jar' command use 'hadoop jar' > . When you use hadoop jar you can ship the dependent jars/files etc as > 1) include them in the /lib folder in your jar > 2) use -libjars / -files to distribute jars or files > > Regards > Bejoy KS > |