|
|
-
JobClient using deprecated JobConf
Martin Becker 2010-09-22, 09:37
Hello,
I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. I wanted to use the JobClient class to circumvent the use of the command line interface. I am noticed that JobClient still uses the deprecated JobConf class for jib submissions. Are there any alternatives to JobClient not using the deprecated JobConf class?
Thanks in advance, Martin
+
Martin Becker 2010-09-22, 09:37
-
Re: JobClient using deprecated JobConf
Amareshwari Sri Ramadasu 2010-09-22, 09:43
In 0.21, JobClient methods are available in org.apache.hadoop.mapreduce.Job and org.apache.hadoop.mapreduce.Cluster classes.
On 9/22/10 3:07 PM, "Martin Becker" <[EMAIL PROTECTED]> wrote:
Hello,
I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. I wanted to use the JobClient class to circumvent the use of the command line interface. I am noticed that JobClient still uses the deprecated JobConf class for jib submissions. Are there any alternatives to JobClient not using the deprecated JobConf class?
Thanks in advance, Martin
+
Amareshwari Sri Ramadasu 2010-09-22, 09:43
-
Re: JobClient using deprecated JobConf
Tom White 2010-09-22, 16:29
Note that JobClient, along with the rest of the "old" API in org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so you can continue to use it without warnings.
Tom
On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu <[EMAIL PROTECTED]> wrote: > In 0.21, JobClient methods are available in org.apache.hadoop.mapreduce.Job > and org.apache.hadoop.mapreduce.Cluster classes. > > On 9/22/10 3:07 PM, "Martin Becker" <[EMAIL PROTECTED]> wrote: > > Hello, > > I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. > I wanted to use the JobClient class to circumvent the use of the command > line interface. > I am noticed that JobClient still uses the deprecated JobConf class for > jib submissions. > Are there any alternatives to JobClient not using the deprecated JobConf > class? > > Thanks in advance, > Martin > > >
+
Tom White 2010-09-22, 16:29
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-22, 16:59
Hello Tom, But could I also use the new API by doing this?: Configuration configuration = new Configuration(); Cluster cluster = new Cluster(configuration); Job job = Job.getInstance(cluster); ... System.exit(job.waitForCompletion(true) ? 0 : 1);
If I do this I get the most peculiar error, telling me: java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecation(Ljava/lang/String;[Ljava/lang/String;)V
I looked into the source and this method does exist. I did use the precomiled version of the jar files coming with the downloadable MapReduce package.
Martin
On 22.09.2010 18:29, Tom White wrote: > Note that JobClient, along with the rest of the "old" API in > org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so > you can continue to use it without warnings. > > Tom > > On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu > <[EMAIL PROTECTED]> wrote: >> In 0.21, JobClient methods are available in org.apache.hadoop.mapreduce.Job >> and org.apache.hadoop.mapreduce.Cluster classes. >> >> On 9/22/10 3:07 PM, "Martin Becker"<[EMAIL PROTECTED]> wrote: >> >> Hello, >> >> I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. >> I wanted to use the JobClient class to circumvent the use of the command >> line interface. >> I am noticed that JobClient still uses the deprecated JobConf class for >> jib submissions. >> Are there any alternatives to JobClient not using the deprecated JobConf >> class? >> >> Thanks in advance, >> Martin >> >> >>
+
Martin Becker 2010-09-22, 16:59
-
Re: JobClient using deprecated JobConf
Tom White 2010-09-22, 18:04
Martin, Can you give more information about how you compiled and ran your job? It probably makes sense to open a JIRA ( https://issues.apache.org/jira/browse/MAPREDUCE) to track this. Cheers Tom On Wed, Sep 22, 2010 at 9:59 AM, Martin Becker <[EMAIL PROTECTED]> wrote: > Hello Tom, > But could I also use the new API by doing this?: > Configuration configuration = new Configuration(); > Cluster cluster = new Cluster(configuration); > Job job = Job.getInstance(cluster); > ... > System.exit(job.waitForCompletion(true) ? 0 : 1); > > If I do this I get the most peculiar error, telling me: > java.lang.NoSuchMethodError: > org.apache.hadoop.conf.Configuration.addDeprecation(Ljava/lang/String;[Ljava/lang/String;)V > > I looked into the source and this method does exist. I did use the > precomiled version of the jar files coming with the downloadable MapReduce > package. > > Martin > > On 22.09.2010 18:29, Tom White wrote: >> >> Note that JobClient, along with the rest of the "old" API in >> org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so >> you can continue to use it without warnings. >> >> Tom >> >> On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu >> <[EMAIL PROTECTED]> wrote: >>> >>> In 0.21, JobClient methods are available in >>> org.apache.hadoop.mapreduce.Job >>> and org.apache.hadoop.mapreduce.Cluster classes. >>> >>> On 9/22/10 3:07 PM, "Martin Becker"<[EMAIL PROTECTED]> wrote: >>> >>> Hello, >>> >>> I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. >>> I wanted to use the JobClient class to circumvent the use of the command >>> line interface. >>> I am noticed that JobClient still uses the deprecated JobConf class for >>> jib submissions. >>> Are there any alternatives to JobClient not using the deprecated JobConf >>> class? >>> >>> Thanks in advance, >>> Martin >>> >>> >>> > >
+
Tom White 2010-09-22, 18:04
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-23, 07:54
Sorry I wrote that message right before I left hoping for a quick solution without enough testing. So never mind. I was including another project which was not supposed to but still exported Hadoop classes. The old ones obviously. I am sorry. Martin On 22.09.2010 20:04, Tom White wrote: > Martin, > > Can you give more information about how you compiled and ran your job? > It probably makes sense to open a JIRA > ( https://issues.apache.org/jira/browse/MAPREDUCE) to track this. > > Cheers > Tom > > On Wed, Sep 22, 2010 at 9:59 AM, Martin Becker<[EMAIL PROTECTED]> wrote: >> Hello Tom, >> But could I also use the new API by doing this?: >> Configuration configuration = new Configuration(); >> Cluster cluster = new Cluster(configuration); >> Job job = Job.getInstance(cluster); >> ... >> System.exit(job.waitForCompletion(true) ? 0 : 1); >> >> If I do this I get the most peculiar error, telling me: >> java.lang.NoSuchMethodError: >> org.apache.hadoop.conf.Configuration.addDeprecation(Ljava/lang/String;[Ljava/lang/String;)V >> >> I looked into the source and this method does exist. I did use the >> precomiled version of the jar files coming with the downloadable MapReduce >> package. >> >> Martin >> >> On 22.09.2010 18:29, Tom White wrote: >>> Note that JobClient, along with the rest of the "old" API in >>> org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so >>> you can continue to use it without warnings. >>> >>> Tom >>> >>> On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu >>> <[EMAIL PROTECTED]> wrote: >>>> In 0.21, JobClient methods are available in >>>> org.apache.hadoop.mapreduce.Job >>>> and org.apache.hadoop.mapreduce.Cluster classes. >>>> >>>> On 9/22/10 3:07 PM, "Martin Becker"<[EMAIL PROTECTED]> wrote: >>>> >>>> Hello, >>>> >>>> I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. >>>> I wanted to use the JobClient class to circumvent the use of the command >>>> line interface. >>>> I am noticed that JobClient still uses the deprecated JobConf class for >>>> jib submissions. >>>> Are there any alternatives to JobClient not using the deprecated JobConf >>>> class? >>>> >>>> Thanks in advance, >>>> Martin >>>> >>>> >>>> >>
+
Martin Becker 2010-09-23, 07:54
-
Re: JobClient using deprecated JobConf
David Rosenstrauch 2010-09-22, 17:53
Hmmm. Any idea as to why the undeprecation? I thought the intention was to try to move everybody to the new API. Why the reversal?
Thanks,
DR
On 09/22/2010 12:29 PM, Tom White wrote: > Note that JobClient, along with the rest of the "old" API in > org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so > you can continue to use it without warnings. > > Tom > > On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu > <[EMAIL PROTECTED]> wrote: >> In 0.21, JobClient methods are available in org.apache.hadoop.mapreduce.Job >> and org.apache.hadoop.mapreduce.Cluster classes. >> >> On 9/22/10 3:07 PM, "Martin Becker"<[EMAIL PROTECTED]> wrote: >> >> Hello, >> >> I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. >> I wanted to use the JobClient class to circumvent the use of the command >> line interface. >> I am noticed that JobClient still uses the deprecated JobConf class for >> jib submissions. >> Are there any alternatives to JobClient not using the deprecated JobConf >> class? >> >> Thanks in advance, >> Martin
+
David Rosenstrauch 2010-09-22, 17:53
-
Re: JobClient using deprecated JobConf
Tom White 2010-09-22, 17:59
David, This was discussed here: http://www.mail-archive.com/[EMAIL PROTECTED]/msg01833.html. The reason is basically to give users more time to move to the new API. The old API will be marked as deprecated in 0.22.0. Cheers, Tom On Wed, Sep 22, 2010 at 10:53 AM, David Rosenstrauch <[EMAIL PROTECTED]> wrote: > Hmmm. Any idea as to why the undeprecation? I thought the intention was to > try to move everybody to the new API. Why the reversal? > > Thanks, > > DR > > On 09/22/2010 12:29 PM, Tom White wrote: >> >> Note that JobClient, along with the rest of the "old" API in >> org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so >> you can continue to use it without warnings. >> >> Tom >> >> On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu >> <[EMAIL PROTECTED]> wrote: >>> >>> In 0.21, JobClient methods are available in >>> org.apache.hadoop.mapreduce.Job >>> and org.apache.hadoop.mapreduce.Cluster classes. >>> >>> On 9/22/10 3:07 PM, "Martin Becker"<[EMAIL PROTECTED]> wrote: >>> >>> Hello, >>> >>> I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. >>> I wanted to use the JobClient class to circumvent the use of the command >>> line interface. >>> I am noticed that JobClient still uses the deprecated JobConf class for >>> jib submissions. >>> Are there any alternatives to JobClient not using the deprecated JobConf >>> class? >>> >>> Thanks in advance, >>> Martin >
+
Tom White 2010-09-22, 17:59
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-23, 08:24
Hi, I would still like to use the new API. So what I am trying to do now is to not use the command line interface to submit a job, but do it from Java code. How do I do this? This is what I do at the moment: * Clean start up of Hadoop (formatted file system and all) * Using the standard WordCount Mapper and Reducer I wrote this main method:
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration configuration = new Configuration(); InetSocketAddress socket = new InetSocketAddress("localhost", 9001); Cluster cluster = new Cluster(socket, configuration);
FileSystem fs = cluster.getFileSystem(); Path homeDirectory = fs.getHomeDirectory();
Path input = new Path(homeDirectory, INPUT); Path output = new Path(homeDirectory, OUTPUT);
fs.delete(output, true); fs.copyFromLocalFile(new Path("resources/test/wordcount/data/ipsum.txt"), new Path(input, "input.txt"));
Job job = Job.getInstance(cluster);
//1 job.addArchiveToClassPath(new Path("release/test.jar"));
//2 job.addFileToClassPath(new Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); // job.addFileToClassPath(new Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); // job.addFileToClassPath(new Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class"));
job.setJarByClass(WordCount.class); job.setMapperClass(Map.class); job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, input); FileOutputFormat.setOutputPath(job, output);
System.exit(job.waitForCompletion(true) ? 0 : 1);
} * I tried to run this code as is in Eclipse. * Obviously, I guess, Hadoop needed the WordClass classes to work so I got this error: java.lang.RuntimeException: java.lang.ClassNotFoundException: de.fstyle.hadoop.tutorial.wordcount.WordCount$Map * Putting everything into a jar and adding the following line did not do any good: job.addArchiveToClassPath(new Path("release/test.jar")); * Adding each class separately throws the same exception: job.addFileToClassPath(new Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); job.addFileToClassPath(new Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); job.addFileToClassPath(new Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class")); * Using job.setJar("release/test.jar"); Will get me java.io.FileNotFoundException: File /tmp/hadoop-martin/mapred/staging/martin/.staging/job_201009221802_0033/job.jar does not exist.
So how would I set this up/use oi correctly? Sorry, I did not find any tutorial or examples anywhere.
Martin On 22.09.2010 18:29, Tom White wrote: > Note that JobClient, along with the rest of the "old" API in > org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so > you can continue to use it without warnings. > > Tom > > On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu > <[EMAIL PROTECTED]> wrote: >> In 0.21, JobClient methods are available in org.apache.hadoop.mapreduce.Job >> and org.apache.hadoop.mapreduce.Cluster classes. >> >> On 9/22/10 3:07 PM, "Martin Becker"<[EMAIL PROTECTED]> wrote: >> >> Hello, >> >> I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. >> I wanted to use the JobClient class to circumvent the use of the command >> line interface. >> I am noticed that JobClient still uses the deprecated JobConf class for >> jib submissions. >> Are there any alternatives to JobClient not using the deprecated JobConf >> class? >> >> Thanks in advance, >> Martin >> >> >>
+
Martin Becker 2010-09-23, 08:24
-
Re: JobClient using deprecated JobConf
Tom White 2010-09-23, 16:54
This tutorial should help: http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.htmlTom On Thu, Sep 23, 2010 at 1:24 AM, Martin Becker <[EMAIL PROTECTED]> wrote: > Hi, > I would still like to use the new API. So what I am trying to do now is to > not use the command line interface to submit a job, but do it from Java > code. How do I do this? This is what I do at the moment: > * Clean start up of Hadoop (formatted file system and all) > * Using the standard WordCount Mapper and Reducer I wrote this main method: > > public static void main(String[] args) throws IOException, > InterruptedException, ClassNotFoundException { > > Configuration configuration = new Configuration(); > InetSocketAddress socket = new InetSocketAddress("localhost", 9001); > Cluster cluster = new Cluster(socket, configuration); > > FileSystem fs = cluster.getFileSystem(); > Path homeDirectory = fs.getHomeDirectory(); > > Path input = new Path(homeDirectory, INPUT); > Path output = new Path(homeDirectory, OUTPUT); > > fs.delete(output, true); > fs.copyFromLocalFile(new > Path("resources/test/wordcount/data/ipsum.txt"), new Path(input, > "input.txt")); > > Job job = Job.getInstance(cluster); > > //1 job.addArchiveToClassPath(new Path("release/test.jar")); > > //2 job.addFileToClassPath(new > Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); > // job.addFileToClassPath(new > Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); > // job.addFileToClassPath(new > Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class")); > > job.setJarByClass(WordCount.class); > job.setMapperClass(Map.class); > job.setCombinerClass(Reduce.class); > job.setReducerClass(Reduce.class); > job.setOutputKeyClass(Text.class); > job.setOutputValueClass(IntWritable.class); > FileInputFormat.addInputPath(job, input); > FileOutputFormat.setOutputPath(job, output); > > System.exit(job.waitForCompletion(true) ? 0 : 1); > > } > * I tried to run this code as is in Eclipse. > * Obviously, I guess, Hadoop needed the WordClass classes to work so I got > this error: > java.lang.RuntimeException: java.lang.ClassNotFoundException: > de.fstyle.hadoop.tutorial.wordcount.WordCount$Map > * Putting everything into a jar and adding the following line did not do any > good: > job.addArchiveToClassPath(new Path("release/test.jar")); > * Adding each class separately throws the same exception: > job.addFileToClassPath(new > Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); > job.addFileToClassPath(new > Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); > job.addFileToClassPath(new > Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class")); > * Using > job.setJar("release/test.jar"); > Will get me > java.io.FileNotFoundException: File > /tmp/hadoop-martin/mapred/staging/martin/.staging/job_201009221802_0033/job.jar > does not exist. > > So how would I set this up/use oi correctly? Sorry, I did not find any > tutorial or examples anywhere. > > Martin > > > On 22.09.2010 18:29, Tom White wrote: > > Note that JobClient, along with the rest of the "old" API in > org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so > you can continue to use it without warnings. > > Tom > > On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu > <[EMAIL PROTECTED]> wrote: > > In 0.21, JobClient methods are available in org.apache.hadoop.mapreduce.Job > and org.apache.hadoop.mapreduce.Cluster classes. > > On 9/22/10 3:07 PM, "Martin Becker" <[EMAIL PROTECTED]> wrote: > > Hello, > > I am using the Hadoop MapReduce version 0.20.2 and soon 0.21. > I wanted to use the JobClient class to circumvent the use of the command > line interface. > I am noticed that JobClient still uses the deprecated JobConf class for > jib submissions. > Are there any alternatives to JobClient not using the deprecated JobConf
+
Tom White 2010-09-23, 16:54
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-23, 17:22
Well, the tutorial let's me know how to use the command line interface. That does work fine. Implementing the Tool interface and all. By scanning through this tutorial roughly I cannot find any way of actually submitting a job _not_ using the command line interface. I want a java application to submit a job, without having to call any script files. Can you give me a pointer? Martin On 23.09.2010 18:54, Tom White wrote: > This tutorial should help: > http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html> > Tom > > On Thu, Sep 23, 2010 at 1:24 AM, Martin Becker<[EMAIL PROTECTED]> wrote: >> Hi, >> I would still like to use the new API. So what I am trying to do now is to >> not use the command line interface to submit a job, but do it from Java >> code. How do I do this? This is what I do at the moment: >> * Clean start up of Hadoop (formatted file system and all) >> * Using the standard WordCount Mapper and Reducer I wrote this main method: >> >> public static void main(String[] args) throws IOException, >> InterruptedException, ClassNotFoundException { >> >> Configuration configuration = new Configuration(); >> InetSocketAddress socket = new InetSocketAddress("localhost", 9001); >> Cluster cluster = new Cluster(socket, configuration); >> >> FileSystem fs = cluster.getFileSystem(); >> Path homeDirectory = fs.getHomeDirectory(); >> >> Path input = new Path(homeDirectory, INPUT); >> Path output = new Path(homeDirectory, OUTPUT); >> >> fs.delete(output, true); >> fs.copyFromLocalFile(new >> Path("resources/test/wordcount/data/ipsum.txt"), new Path(input, >> "input.txt")); >> >> Job job = Job.getInstance(cluster); >> >> //1 job.addArchiveToClassPath(new Path("release/test.jar")); >> >> //2 job.addFileToClassPath(new >> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); >> // job.addFileToClassPath(new >> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); >> // job.addFileToClassPath(new >> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class")); >> >> job.setJarByClass(WordCount.class); >> job.setMapperClass(Map.class); >> job.setCombinerClass(Reduce.class); >> job.setReducerClass(Reduce.class); >> job.setOutputKeyClass(Text.class); >> job.setOutputValueClass(IntWritable.class); >> FileInputFormat.addInputPath(job, input); >> FileOutputFormat.setOutputPath(job, output); >> >> System.exit(job.waitForCompletion(true) ? 0 : 1); >> >> } >> * I tried to run this code as is in Eclipse. >> * Obviously, I guess, Hadoop needed the WordClass classes to work so I got >> this error: >> java.lang.RuntimeException: java.lang.ClassNotFoundException: >> de.fstyle.hadoop.tutorial.wordcount.WordCount$Map >> * Putting everything into a jar and adding the following line did not do any >> good: >> job.addArchiveToClassPath(new Path("release/test.jar")); >> * Adding each class separately throws the same exception: >> job.addFileToClassPath(new >> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); >> job.addFileToClassPath(new >> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); >> job.addFileToClassPath(new >> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class")); >> * Using >> job.setJar("release/test.jar"); >> Will get me >> java.io.FileNotFoundException: File >> /tmp/hadoop-martin/mapred/staging/martin/.staging/job_201009221802_0033/job.jar >> does not exist. >> >> So how would I set this up/use oi correctly? Sorry, I did not find any >> tutorial or examples anywhere. >> >> Martin >> >> >> On 22.09.2010 18:29, Tom White wrote: >> >> Note that JobClient, along with the rest of the "old" API in >> org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so >> you can continue to use it without warnings. >> >> Tom >> >> On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu >> <[EMAIL PROTECTED]> wrote:
+
Martin Becker 2010-09-23, 17:22
-
Re: JobClient using deprecated JobConf
James Hammerton 2010-09-24, 10:33
Hi, That tutorial includes java source code that submits a job. Look at what main() and run() are doing. Or are you trying to avoid using the "hadoop" command? Surely all you need to do with your java app once written is run it via the "hadoop" command rather than via the "java" command? James On Thu, Sep 23, 2010 at 6:22 PM, Martin Becker <[EMAIL PROTECTED]> wrote: > Well, the tutorial let's me know how to use the command line interface. > That does work fine. Implementing the Tool interface and all. By scanning > through this tutorial roughly I cannot find any way of actually submitting a > job _not_ using the command line interface. I want a java application to > submit a job, without having to call any script files. Can you give me a > pointer? > > Martin > > > On 23.09.2010 18:54, Tom White wrote: > >> This tutorial should help: >> http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html>> >> Tom >> >> On Thu, Sep 23, 2010 at 1:24 AM, Martin Becker<[EMAIL PROTECTED]> >> wrote: >> >>> Hi, >>> I would still like to use the new API. So what I am trying to do now is >>> to >>> not use the command line interface to submit a job, but do it from Java >>> code. How do I do this? This is what I do at the moment: >>> * Clean start up of Hadoop (formatted file system and all) >>> * Using the standard WordCount Mapper and Reducer I wrote this main >>> method: >>> >>> public static void main(String[] args) throws IOException, >>> InterruptedException, ClassNotFoundException { >>> >>> Configuration configuration = new Configuration(); >>> InetSocketAddress socket = new InetSocketAddress("localhost", 9001); >>> Cluster cluster = new Cluster(socket, configuration); >>> >>> FileSystem fs = cluster.getFileSystem(); >>> Path homeDirectory = fs.getHomeDirectory(); >>> >>> Path input = new Path(homeDirectory, INPUT); >>> Path output = new Path(homeDirectory, OUTPUT); >>> >>> fs.delete(output, true); >>> fs.copyFromLocalFile(new >>> Path("resources/test/wordcount/data/ipsum.txt"), new Path(input, >>> "input.txt")); >>> >>> Job job = Job.getInstance(cluster); >>> >>> //1 job.addArchiveToClassPath(new Path("release/test.jar")); >>> >>> //2 job.addFileToClassPath(new >>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); >>> // job.addFileToClassPath(new >>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); >>> // job.addFileToClassPath(new >>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class")); >>> >>> job.setJarByClass(WordCount.class); >>> job.setMapperClass(Map.class); >>> job.setCombinerClass(Reduce.class); >>> job.setReducerClass(Reduce.class); >>> job.setOutputKeyClass(Text.class); >>> job.setOutputValueClass(IntWritable.class); >>> FileInputFormat.addInputPath(job, input); >>> FileOutputFormat.setOutputPath(job, output); >>> >>> System.exit(job.waitForCompletion(true) ? 0 : 1); >>> >>> } >>> * I tried to run this code as is in Eclipse. >>> * Obviously, I guess, Hadoop needed the WordClass classes to work so I >>> got >>> this error: >>> java.lang.RuntimeException: java.lang.ClassNotFoundException: >>> de.fstyle.hadoop.tutorial.wordcount.WordCount$Map >>> * Putting everything into a jar and adding the following line did not do >>> any >>> good: >>> job.addArchiveToClassPath(new Path("release/test.jar")); >>> * Adding each class separately throws the same exception: >>> job.addFileToClassPath(new >>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount.class")); >>> job.addFileToClassPath(new >>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Map.class")); >>> job.addFileToClassPath(new >>> Path("bin/de/fstyle/hadoop/tutorial/wordcount/WordCount$Reduce.class")); >>> * Using >>> job.setJar("release/test.jar"); >>> Will get me >>> java.io.FileNotFoundException: File >>> >>> /tmp/hadoop-martin/mapred/staging/martin/.staging/job_201009221802_0033/job.jar James Hammerton | Senior Data Mining Engineer www.mendeley.com/profiles/james-hammerton Mendeley Limited | London, UK | www.mendeley.com Registered in England and Wales | Company Number 6419015
+
James Hammerton 2010-09-24, 10:33
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-24, 15:12
Hi James,
I am trying to avoid to call any command line command. I want to submit a job from within a java application. If possible without packing any jar file at all. But I guess that will be necessary to allow Hadoop to load the specific classes. The tutorial definitely does not contain any explicit java code how to do this. Sorry, for not stating my problem clearly:
Right now I want to use Eclipse to submit my job by doing using the "Run as..." dialog. Later I want to embed that part in a java application submitting configured jobs to a remote Hadoop system/cluster.
Regards, Martin
+
Martin Becker 2010-09-24, 15:12
-
Re: JobClient using deprecated JobConf
David Rosenstrauch 2010-09-24, 15:29
On 09/24/2010 11:12 AM, Martin Becker wrote: > Hi James, > > I am trying to avoid to call any command line command. I want to submit > a job from within a java application. If possible without packing any > jar file at all. But I guess that will be necessary to allow Hadoop to > load the specific classes. The tutorial definitely does not contain any > explicit java code how to do this. Sorry, for not stating my problem > clearly: > > Right now I want to use Eclipse to submit my job by doing using the "Run > as..." dialog. Later I want to embed that part in a java application > submitting configured jobs to a remote Hadoop system/cluster. > > Regards, > Martin
This is very do-able. (I do this now.)
Here is a skeleton for how it can be done:
public class JobSubmitter implements Tool { public static void main(String[] args) throws Exception { ToolRunner.run(new Configuration(), new JobSubmitter(), args); }
public JobSubmitter() { <your code here> }
public Configuration getConf() { return appConf; }
public void setConf(Configuration conf) { this.appConf = conf; }
public int run(String[] args) throws Exception { Job job = new Job(appConf); Configuration jobConf = job.getConfiguration(); jobConf.set(<your code here>); <your code here> job.submit(); } }
re: "without packing any jar file at all":
If you use Tool/ToolRunner (as we are doing above), that lets your Hadoop app automatically handle some key command line args. One them that you will use here is the -libjars argument. If you use -libjars and specify a list of jars that contain your code, then ToolRunner will automatically take those jars and put them in the Distributed Cache on each task node, where they will get added to the classpath of every map/reduce task.
HTH,
DR
+
David Rosenstrauch 2010-09-24, 15:29
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-24, 16:42
Hello David,
Thanks for your suggestions. I fail to see where your approach is different from the one used in the tutorial. The -libjars option is a command line option of the Hadoop executable. I do not want to call that executable. Maybe I don't see the point. My implementation is basically the same as your template. And using the Hadoop executable with my main jar and the additional jars loaded by -libjars works fine.
Regards, Martin On 24.09.2010 17:29, David Rosenstrauch wrote: > On 09/24/2010 11:12 AM, Martin Becker wrote: >> Hi James, >> >> I am trying to avoid to call any command line command. I want to submit >> a job from within a java application. If possible without packing any >> jar file at all. But I guess that will be necessary to allow Hadoop to >> load the specific classes. The tutorial definitely does not contain any >> explicit java code how to do this. Sorry, for not stating my problem >> clearly: >> >> Right now I want to use Eclipse to submit my job by doing using the "Run >> as..." dialog. Later I want to embed that part in a java application >> submitting configured jobs to a remote Hadoop system/cluster. >> >> Regards, >> Martin > > This is very do-able. (I do this now.) > > Here is a skeleton for how it can be done: > > public class JobSubmitter implements Tool { > public static void main(String[] args) throws Exception { > ToolRunner.run(new Configuration(), new JobSubmitter(), args); > } > > public JobSubmitter() { > <your code here> > } > > public Configuration getConf() { > return appConf; > } > > public void setConf(Configuration conf) { > this.appConf = conf; > } > > public int run(String[] args) throws Exception { > Job job = new Job(appConf); > Configuration jobConf = job.getConfiguration(); > jobConf.set(<your code here>); > <your code here> > job.submit(); > } > } > > > > re: "without packing any jar file at all": > > If you use Tool/ToolRunner (as we are doing above), that lets your > Hadoop app automatically handle some key command line args. One them > that you will use here is the -libjars argument. If you use -libjars > and specify a list of jars that contain your code, then ToolRunner > will automatically take those jars and put them in the Distributed > Cache on each task node, where they will get added to the classpath of > every map/reduce task. > > HTH, > > DR
+
Martin Becker 2010-09-24, 16:42
-
Re: JobClient using deprecated JobConf
David Rosenstrauch 2010-09-24, 16:53
On 09/24/2010 12:42 PM, Martin Becker wrote: > Hello David, > > Thanks for your suggestions. I fail to see where your approach is > different from the one used in the tutorial.
The difference is that the tutorial launches the job using the "hadoop" executable:
$ bin/hadoop jar /user/joe/wordcount.jar org.myorg.WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output
With the example I gave, you would launch your app directly from the command line
$ java -cp <jars> YourApp -libjars <jars> <parms>
> The -libjars option is a > command line option of the Hadoop executable.
By implementing the Tool/ToolRunner approach, you are making the -libjars option an option of your app too. Which is why you are able to run it natively from the command line without the hadoop executable and have it distribute the jars to the necessary places in the cluster.
HTH,
DR
+
David Rosenstrauch 2010-09-24, 16:53
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-24, 17:26
Hello David,
This will at best run my MapReduce process on the local Hadoop instance. What do I do to submit it to a remote Hadoop cluster using Java code?
Martin
On 24.09.2010 18:53, David Rosenstrauch wrote: > On 09/24/2010 12:42 PM, Martin Becker wrote: >> Hello David, >> >> Thanks for your suggestions. I fail to see where your approach is >> different from the one used in the tutorial. > > The difference is that the tutorial launches the job using the > "hadoop" executable: > > $ bin/hadoop jar /user/joe/wordcount.jar org.myorg.WordCount2 > /user/joe/wordcount/input /user/joe/wordcount/output > > With the example I gave, you would launch your app directly from the > command line > > $ java -cp <jars> YourApp -libjars <jars> <parms> > >> The -libjars option is a >> command line option of the Hadoop executable. > > By implementing the Tool/ToolRunner approach, you are making the > -libjars option an option of your app too. Which is why you are able > to run it natively from the command line without the hadoop executable > and have it distribute the jars to the necessary places in the cluster. > > HTH, > > DR
+
Martin Becker 2010-09-24, 17:26
-
Re: JobClient using deprecated JobConf
David Rosenstrauch 2010-09-24, 18:44
On 09/24/2010 01:26 PM, Martin Becker wrote: > Hello David, > > This will at best run my MapReduce process on the local Hadoop instance. > What do I do to submit it to a remote Hadoop cluster using Java code? > > Martin
$ java -cp <jars> YourApp -libjars <jars> -jt <hostname_of_job_tracker_in_remote_cluster:job_tracker_port_number> -fs <hdfs://hostname_of_name_nod_in_remote_cluster:name_node_port_number> <parms>
DR
+
David Rosenstrauch 2010-09-24, 18:44
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-25, 14:24
Hello David,
thanks a lot. Yet I want java code to submit my application. I do not want to mess with any kind of command line arguments or an executable, neither Java nor Hadoop. I want to write a method that can set up and submit a job to an arbitrary cluster. Something like calling CustomJob.submitJob(ip:port). This would be used by a GUI or another java application to process data. I suspect that the classes Cluster and Job will solve my problem as proposed earlier. The problem is the the missing job.jar, as also described earlier. I will start a new thread describing my problem using a more accurate header.
Thank you, Martin
On 24.09.2010 20:44, David Rosenstrauch wrote: > On 09/24/2010 01:26 PM, Martin Becker wrote: >> Hello David, >> >> This will at best run my MapReduce process on the local Hadoop instance. >> What do I do to submit it to a remote Hadoop cluster using Java code? >> >> Martin > > $ java -cp <jars> YourApp -libjars <jars> -jt > <hostname_of_job_tracker_in_remote_cluster:job_tracker_port_number> > -fs > <hdfs://hostname_of_name_nod_in_remote_cluster:name_node_port_number> > <parms> > > DR
+
Martin Becker 2010-09-25, 14:24
-
Re: JobClient using deprecated JobConf
David Rosenstrauch 2010-09-27, 15:54
On 09/25/2010 10:24 AM, Martin Becker wrote: > Hello David, > > thanks a lot. Yet I want java code to submit my application. I do not > want to mess with any kind of command line arguments or an executable, > neither Java nor Hadoop. I want to write a method that can set up and > submit a job to an arbitrary cluster. Something like calling > CustomJob.submitJob(ip:port). This would be used by a GUI or another > java application to process data.
Also easy to do.
All that the -jt and -fs parms do is eventually set appropriate values in a Configuration object. You can just as easily do this programmatically.
i.e., for -jt:
conf.set("mapred.job.tracker", <hostname_of_job_tracker_in_remote_cluster:job_tracker_port_number>);
for -fs: conf.set("fs.default.name", <hdfs://hostname_of_name_node_in_remote_cluster:name_node_port_number>); On a side note: as far as your requirement "I do not want to mess with any kind of command line arguments or an executable, neither Java nor Hadoop", I'm not sure how feasible this requirement is because of the -libjars command line parm.
It's easy to write code to handle what the -fs and -jt command line parms do (see above). But -libjars is much more complicated. It takes the list of jars that you give it, sends each one to the Hadoop DistributedCache so that it gets distributed to each node in the cluster, and then adds each one to the classpath of each map/reduce tasks.
Yes, I suppose you could try to write code to do that yourself, but I can't see why you would want to reinvent the wheel here. So this makes me question whether that requirement really makes sense. > The problem is the the > missing job.jar, as also described earlier. I will start a new thread > describing my problem using a more accurate header. > > Thank you, > Martin
I'm not sure what you're referring to re: job.jar.
DR
+
David Rosenstrauch 2010-09-27, 15:54
-
Re: JobClient using deprecated JobConf
James Hammerton 2010-09-24, 15:32
http://wiki.apache.org/hadoop/EclipsePlugIn may be of interest for the former requirement. Regards, James On Fri, Sep 24, 2010 at 4:12 PM, Martin Becker <[EMAIL PROTECTED]> wrote: > Hi James, > > I am trying to avoid to call any command line command. I want to submit a > job from within a java application. If possible without packing any jar file > at all. But I guess that will be necessary to allow Hadoop to load the > specific classes. The tutorial definitely does not contain any explicit java > code how to do this. Sorry, for not stating my problem clearly: > > Right now I want to use Eclipse to submit my job by doing using the "Run > as..." dialog. Later I want to embed that part in a java application > submitting configured jobs to a remote Hadoop system/cluster. > > Regards, > Martin > -- James Hammerton | Senior Data Mining Engineer www.mendeley.com/profiles/james-hammerton Mendeley Limited | London, UK | www.mendeley.com Registered in England and Wales | Company Number 6419015
+
James Hammerton 2010-09-24, 15:32
-
Re: JobClient using deprecated JobConf
Martin Becker 2010-09-24, 15:40
Thank you James, the JIRA ( https://issues.apache.org/jira/browse/MAPREDUCE-1262 )states that this plugin is only for Hadoop 0.20.0/1, yet I need support for 0.21.0. And my the goal actually is to submit jobs to a remote Hadoop cluster. As suggested earlier (probably) using the Cluster and Job class (formally using JobClient, but I want to avoid that class, as it seems to be marked deprecated sooner or later). Thanks again, Martin On 24.09.2010 17:32, James Hammerton wrote: > http://wiki.apache.org/hadoop/EclipsePlugIn may be of interest for the > former requirement. > > Regards, > > James > > On Fri, Sep 24, 2010 at 4:12 PM, Martin Becker <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hi James, > > I am trying to avoid to call any command line command. I want to > submit a job from within a java application. If possible without > packing any jar file at all. But I guess that will be necessary to > allow Hadoop to load the specific classes. The tutorial definitely > does not contain any explicit java code how to do this. Sorry, for > not stating my problem clearly: > > Right now I want to use Eclipse to submit my job by doing using > the "Run as..." dialog. Later I want to embed that part in a java > application submitting configured jobs to a remote Hadoop > system/cluster. > > Regards, > Martin > > > > > -- > James Hammerton | Senior Data Mining Engineer > www.mendeley.com/profiles/james-hammerton > < http://www.mendeley.com/profiles/james-hammerton>> > Mendeley Limited | London, UK | www.mendeley.com < http://www.mendeley.com>> Registered in England and Wales | Company Number 6419015 > > >
+
Martin Becker 2010-09-24, 15:40
|
|