|
springring
2013-03-15, 09:18
Steve Loughran
2013-03-15, 10:58
Harsh J
2013-03-17, 06:22
springring
2013-03-18, 01:52
Harsh J
2013-03-18, 01:57
springring
2013-03-18, 02:48
|
-
how to define new InputFormat with streaming?springring 2013-03-15, 09:18
Hi,
my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new InputFormat in hadoop book , but there is error "class org.apache.hadoop.streaming.WholeFileInputFormat not org.apache.hadoop.mapred.InputFormat" Hadoop version is 0.20, but the streaming still depend on 0.10 mapred api? the detail: ************************************************************************************************************************************************************* javac -classpath /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class7 ./*.java cd class7 jar uf /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar org/apache/hadoop/streaming/*.class hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -inputformat WholeFileInputFormat -mapper xmlmappertest.py -file xmlmappertest.py -input /user/hdfs/tarcatalog -output /user/hive/external/catalog -jobconf mapred.map.tasks=108 13/03/15 16:27:51 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.streaming.WholeFileInputFormat not org.apache.hadoop.mapred.InputFormat at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1070) at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:609) at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:707) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:122) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) *****************************************************************the code from hadoop book******************************************************************* WholeFileInputFormat.java // cc WholeFileInputFormat An InputFormat for reading a whole file as a record importjava.io.IOException; importorg.apache.hadoop.fs.*; importorg.apache.hadoop.io.*; importorg.apache.hadoop.mapreduce.InputSplit; importorg.apache.hadoop.mapreduce.JobContext; importorg.apache.hadoop.mapreduce.RecordReader; importorg.apache.hadoop.mapreduce.TaskAttemptContext; importorg.apache.hadoop.mapreduce.lib.input.*; //vv WholeFileInputFormat publicclassWholeFileInputFormat extendsFileInputFormat<NullWritable,BytesWritable>{ @Override protectedbooleanisSplitable(JobContextcontext,Pathfile){ returnfalse; } @Override publicRecordReader<NullWritable,BytesWritable>createRecordReader( InputSplitsplit,TaskAttemptContextcontext)throwsIOException, InterruptedException{ WholeFileRecordReaderreader=newWholeFileRecordReader(); reader.initialize(split,context); returnreader; } } //^^ WholeFileInputFormat WholeFileRecordReader.java // cc WholeFileRecordReader The RecordReader used by WholeFileInputFormat for reading a whole file as a record importjava.io.IOException; importorg.apache.hadoop.conf.Configuration; importorg.apache.hadoop.fs.FSDataInputStream; importorg.apache.hadoop.fs.FileSystem; importorg.apache.hadoop.fs.Path; importorg.apache.hadoop.io.BytesWritable; importorg.apache.hadoop.io.IOUtils; importorg.apache.hadoop.io.NullWritable; importorg.apache.hadoop.mapreduce.InputSplit; importorg.apache.hadoop.mapreduce.RecordReader; importorg.apache.hadoop.mapreduce.TaskAttemptContext; importorg.apache.hadoop.mapreduce.lib.input.FileSplit; //vv WholeFileRecordReader classWholeFileRecordReaderextendsRecordReader<NullWritable,BytesWritable>{ privateFileSplitfileSplit; privateConfigurationconf; privateBytesWritablevalue=newBytesWritable(); privatebooleanprocessed=false; @Override publicvoidinitialize(InputSplitsplit,TaskAttemptContextcontext) throwsIOException,InterruptedException{ this.fileSplit=(FileSplit)split; this.conf=context.getConfiguration(); } @Override publicbooleannextKeyValue()throwsIOException,InterruptedException{ if(!processed){ byte[]contents=newbyte[(int)fileSplit.getLength()]; Pathfile=fileSplit.getPath(); FileSystemfs=file.getFileSystem(conf); FSDataInputStreamin=null; try{ in=fs.open(file); IOUtils.readFully(in,contents,0,contents.length); value.set(contents,0,contents.length); }finally{ IOUtils.closeStream(in); } processed=true; returntrue; } returnfalse; } @Override publicNullWritablegetCurrentKey()throwsIOException,InterruptedException{ returnNullWritable.get(); } @Override publicBytesWritablegetCurrentValue()throwsIOException, InterruptedException{ returnvalue; } @Override publicfloatgetProgress()throwsIOException{ returnprocessed?1.0f:0.0f; } @Override publicvoidclose()throwsIOException{ // do nothing } } //^^ WholeFileRecordReader
-
Re: how to define new InputFormat with streaming?Steve Loughran 2013-03-15, 10:58
On 15 March 2013 09:18, springring <[EMAIL PROTECTED]> wrote:
> Hi, > > my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new > InputFormat in hadoop book , but there is error > "class org.apache.hadoop.streaming.WholeFileInputFormat not > org.apache.hadoop.mapred.InputFormat" > > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred api? > 1. please don't spam all the lists 2. grab a later version of the apache releases if you want help on them on these mailing lists, or go to the cloudera lists, where they will probably say "upgrade to CDH 4.x" before asking questions. thanks
-
Re: how to define new InputFormat with streaming?Harsh J 2013-03-17, 06:22
The issue is that Streaming expects the old/stable MR API
(org.apache.hadoop.mapred.InputFormat) as its input format class, but your WholeFileInputFormat is using the new MR API (org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form will let you pass. This has nothing to do with your version/distribution of Hadoop. On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran <[EMAIL PROTECTED]>wrote: > On 15 March 2013 09:18, springring <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new > > InputFormat in hadoop book , but there is error > > "class org.apache.hadoop.streaming.WholeFileInputFormat not > > org.apache.hadoop.mapred.InputFormat" > > > > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred > api? > > > > > 1. please don't spam all the lists > 2. grab a later version of the apache releases if you want help on them on > these mailing lists, or go to the cloudera lists, where they will probably > say "upgrade to CDH 4.x" before asking questions. > > thanks > -- Harsh J
-
Re:Re: how to define new InputFormat with streaming?springring 2013-03-18, 01:52
thanks
I modify the java file with old "mapred" API, but there is still error javac -classpath /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class9 ./*.java ./WholeFileInputFormat.java:16: error: package org.apache.hadoop.mapred.lib.input does not exist import org.apache.hadoop.mapred.lib.input.*; does it because hadoop-0.20.2-cdh3u3 not include "mapred" API? At 2013-03-17 14:22:43,"Harsh J" <[EMAIL PROTECTED]> wrote: >The issue is that Streaming expects the old/stable MR API >(org.apache.hadoop.mapred.InputFormat) as its input format class, but your >WholeFileInputFormat is using the new MR API >(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form >will let you pass. > >This has nothing to do with your version/distribution of Hadoop. > > >On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran <[EMAIL PROTECTED]>wrote: > >> On 15 March 2013 09:18, springring <[EMAIL PROTECTED]> wrote: >> >> > Hi, >> > >> > my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new >> > InputFormat in hadoop book , but there is error >> > "class org.apache.hadoop.streaming.WholeFileInputFormat not >> > org.apache.hadoop.mapred.InputFormat" >> > >> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred >> api? >> > >> >> >> 1. please don't spam all the lists >> 2. grab a later version of the apache releases if you want help on them on >> these mailing lists, or go to the cloudera lists, where they will probably >> say "upgrade to CDH 4.x" before asking questions. >> >> thanks >> > > > >-- >Harsh J
-
Re: Re: how to define new InputFormat with streaming?Harsh J 2013-03-18, 01:57
It isn't as easy as changing that import line:
> package org.apache.hadoop.mapred.lib.input does not exist The right package is package org.apache.hadoop.mapred. On Mon, Mar 18, 2013 at 7:22 AM, springring <[EMAIL PROTECTED]> wrote: > thanks > I modify the java file with old "mapred" API, but there is still error > > javac -classpath /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class9 ./*.java > ./WholeFileInputFormat.java:16: error: package org.apache.hadoop.mapred.lib.input does not exist > import org.apache.hadoop.mapred.lib.input.*; > > does it because hadoop-0.20.2-cdh3u3 not include "mapred" API? > > > > > > > At 2013-03-17 14:22:43,"Harsh J" <[EMAIL PROTECTED]> wrote: >>The issue is that Streaming expects the old/stable MR API >>(org.apache.hadoop.mapred.InputFormat) as its input format class, but your >>WholeFileInputFormat is using the new MR API >>(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form >>will let you pass. >> >>This has nothing to do with your version/distribution of Hadoop. >> >> >>On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran <[EMAIL PROTECTED]>wrote: >> >>> On 15 March 2013 09:18, springring <[EMAIL PROTECTED]> wrote: >>> >>> > Hi, >>> > >>> > my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new >>> > InputFormat in hadoop book , but there is error >>> > "class org.apache.hadoop.streaming.WholeFileInputFormat not >>> > org.apache.hadoop.mapred.InputFormat" >>> > >>> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred >>> api? >>> > >>> >>> >>> 1. please don't spam all the lists >>> 2. grab a later version of the apache releases if you want help on them on >>> these mailing lists, or go to the cloudera lists, where they will probably >>> say "upgrade to CDH 4.x" before asking questions. >>> >>> thanks >>> >> >> >> >>-- >>Harsh J -- Harsh J
-
Re:Re: Re: how to define new InputFormat with streaming?springring 2013-03-18, 02:48
you are right!
Now the import path is all right. At 2013-03-18 09:57:33,"Harsh J" <[EMAIL PROTECTED]> wrote: >It isn't as easy as changing that import line: > >> package org.apache.hadoop.mapred.lib.input does not exist > >The right package is package org.apache.hadoop.mapred. > >On Mon, Mar 18, 2013 at 7:22 AM, springring <[EMAIL PROTECTED]> wrote: >> thanks >> I modify the java file with old "mapred" API, but there is still error >> >> javac -classpath /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class9 ./*.java >> ./WholeFileInputFormat.java:16: error: package org.apache.hadoop.mapred.lib.input does not exist >> import org.apache.hadoop.mapred.lib.input.*; >> >> does it because hadoop-0.20.2-cdh3u3 not include "mapred" API? >> >> >> >> >> >> >> At 2013-03-17 14:22:43,"Harsh J" <[EMAIL PROTECTED]> wrote: >>>The issue is that Streaming expects the old/stable MR API >>>(org.apache.hadoop.mapred.InputFormat) as its input format class, but your >>>WholeFileInputFormat is using the new MR API >>>(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form >>>will let you pass. >>> >>>This has nothing to do with your version/distribution of Hadoop. >>> >>> >>>On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran <[EMAIL PROTECTED]>wrote: >>> >>>> On 15 March 2013 09:18, springring <[EMAIL PROTECTED]> wrote: >>>> >>>> > Hi, >>>> > >>>> > my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new >>>> > InputFormat in hadoop book , but there is error >>>> > "class org.apache.hadoop.streaming.WholeFileInputFormat not >>>> > org.apache.hadoop.mapred.InputFormat" >>>> > >>>> > Hadoop version is 0.20, but the streaming still depend on 0.10 mapred >>>> api? >>>> > >>>> >>>> >>>> 1. please don't spam all the lists >>>> 2. grab a later version of the apache releases if you want help on them on >>>> these mailing lists, or go to the cloudera lists, where they will probably >>>> say "upgrade to CDH 4.x" before asking questions. >>>> >>>> thanks >>>> >>> >>> >>> >>>-- >>>Harsh J > > > >-- >Harsh J |