|
|
-
Avro and Hadoop streaming
Miki Tebeka 2011-06-02, 21:30
Greetings, I'd like to use hadoop streaming with Avro files. My plan is to write an inputformat class that emits json records, one per line. This way the streaming application can read one record per line. ( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs)I couldn't find any documentation/help about writing inputformat classes. Can someone point me to the right direction? Thanks, -- Miki
-
Re: Avro and Hadoop streaming
Tatu Saloranta 2011-06-03, 16:18
On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Miki, > > Have you looked at AvroAsTextInputFormat? > > http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html> > Also, release 1.5.2 will include AvroTextOutputFormat: > > https://issues.apache.org/jira/browse/AVRO-830> > Are these perhaps what you're looking for? Also, if anyone is interested in getting Jackson POJO data binding work seamlessly, there are tentative plans for introducing various non-JSON reader/generator backends ("read/write X as if it was JSON"); Avro is one of prime candidates, possibly second most likely to be written after CSV. There are already backends for reading XML, Bson and Smile (binary JSON) and expose it via Jackson's reader/generator to get full data binding support; and obviously trivial transcoding (json<->xml for example) capabilities between any two supported formats. So if this is something of interest, feel free to come by jackson dev mailing list to talk about it bit more. -+ Tatu +-
-
Re: Avro and Hadoop streaming
Miki Tebeka 2011-06-15, 00:01
Greetings, I've tried to run a job with the following command: hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ -input /in/avro \ -output $out \ -mapper avro-mapper.py \ -reducer avro-reducer.py \ -file avro-mapper.py \ -file avro-reducer.py \ -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ -inputformat AvroAsTextInputFormat However I get -inputformat : class not found : AvroAsTextInputFormat I'm probably missing something obvious to do. Any ideas? Thanks! -- Miki On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Miki, > > Have you looked at AvroAsTextInputFormat? > > http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html> > Also, release 1.5.2 will include AvroTextOutputFormat: > > https://issues.apache.org/jira/browse/AVRO-830> > Are these perhaps what you're looking for? > > Doug > > On 06/02/2011 11:30 PM, Miki Tebeka wrote: >> Greetings, >> >> I'd like to use hadoop streaming with Avro files. >> My plan is to write an inputformat class that emits json records, one >> per line. This way the streaming application can read one record per >> line. >> ( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs)>> >> I couldn't find any documentation/help about writing inputformat >> classes. Can someone point me to the right direction? >> >> Thanks, >> -- >> Miki >
-
Re: Avro and Hadoop streaming
Harsh J 2011-06-15, 10:33
Miki, You'll need to provide the entire canonical class name (org.apache.avro.mapred…). On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> wrote: > Greetings, > > I've tried to run a job with the following command: > > hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ > -input /in/avro \ > -output $out \ > -mapper avro-mapper.py \ > -reducer avro-reducer.py \ > -file avro-mapper.py \ > -file avro-reducer.py \ > -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ > -inputformat AvroAsTextInputFormat > > However I get > -inputformat : class not found : AvroAsTextInputFormat > > I'm probably missing something obvious to do. > > Any ideas? > > Thanks! > -- > Miki > > On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> Miki, >> >> Have you looked at AvroAsTextInputFormat? >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html>> >> Also, release 1.5.2 will include AvroTextOutputFormat: >> >> https://issues.apache.org/jira/browse/AVRO-830>> >> Are these perhaps what you're looking for? >> >> Doug >> >> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>> Greetings, >>> >>> I'd like to use hadoop streaming with Avro files. >>> My plan is to write an inputformat class that emits json records, one >>> per line. This way the streaming application can read one record per >>> line. >>> ( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs)>>> >>> I couldn't find any documentation/help about writing inputformat >>> classes. Can someone point me to the right direction? >>> >>> Thanks, >>> -- >>> Miki >> > -- Harsh J
-
Re: Avro and Hadoop streaming
Miki Tebeka 2011-06-15, 16:26
Still didn't work. I'm pretty new to hadoop world, I probably need to place the avro jar somewhere on the classpath of the nodes, however I have no idea how to do that. On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Miki, > > You'll need to provide the entire canonical class name > (org.apache.avro.mapred…). > > On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> wrote: >> Greetings, >> >> I've tried to run a job with the following command: >> >> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >> -input /in/avro \ >> -output $out \ >> -mapper avro-mapper.py \ >> -reducer avro-reducer.py \ >> -file avro-mapper.py \ >> -file avro-reducer.py \ >> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >> -inputformat AvroAsTextInputFormat >> >> However I get >> -inputformat : class not found : AvroAsTextInputFormat >> >> I'm probably missing something obvious to do. >> >> Any ideas? >> >> Thanks! >> -- >> Miki >> >> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>> Miki, >>> >>> Have you looked at AvroAsTextInputFormat? >>> >>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html>>> >>> Also, release 1.5.2 will include AvroTextOutputFormat: >>> >>> https://issues.apache.org/jira/browse/AVRO-830>>> >>> Are these perhaps what you're looking for? >>> >>> Doug >>> >>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>> Greetings, >>>> >>>> I'd like to use hadoop streaming with Avro files. >>>> My plan is to write an inputformat class that emits json records, one >>>> per line. This way the streaming application can read one record per >>>> line. >>>> ( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs)>>>> >>>> I couldn't find any documentation/help about writing inputformat >>>> classes. Can someone point me to the right direction? >>>> >>>> Thanks, >>>> -- >>>> Miki >>> >> > > > > -- > Harsh J >
-
Re: Avro and Hadoop streaming
Matt Pouttu-Clarke 2011-06-15, 16:30
You have to package it in the job jar file under a /lib directory. On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote: > Still didn't work. > > I'm pretty new to hadoop world, I probably need to place the avro jar > somewhere on the classpath of the nodes, > however I have no idea how to do that. > > On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> Miki, >> >> You'll need to provide the entire canonical class name >> (org.apache.avro.mapredS). >> >> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> wrote: >>> Greetings, >>> >>> I've tried to run a job with the following command: >>> >>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>> -input /in/avro \ >>> -output $out \ >>> -mapper avro-mapper.py \ >>> -reducer avro-reducer.py \ >>> -file avro-mapper.py \ >>> -file avro-reducer.py \ >>> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>> -inputformat AvroAsTextInputFormat >>> >>> However I get >>> -inputformat : class not found : AvroAsTextInputFormat >>> >>> I'm probably missing something obvious to do. >>> >>> Any ideas? >>> >>> Thanks! >>> -- >>> Miki >>> >>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>>> Miki, >>>> >>>> Have you looked at AvroAsTextInputFormat? >>>> >>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsT>>>> extInputFormat.html >>>> >>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>> >>>> https://issues.apache.org/jira/browse/AVRO-830>>>> >>>> Are these perhaps what you're looking for? >>>> >>>> Doug >>>> >>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>> Greetings, >>>>> >>>>> I'd like to use hadoop streaming with Avro files. >>>>> My plan is to write an inputformat class that emits json records, one >>>>> per line. This way the streaming application can read one record per >>>>> line. >>>>> ( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Ot>>>>> her+Plugins+for+Jobs) >>>>> >>>>> I couldn't find any documentation/help about writing inputformat >>>>> classes. Can someone point me to the right direction? >>>>> >>>>> Thanks, >>>>> -- >>>>> Miki >>>> >>> >> >> >> >> -- >> Harsh J >> iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
-
Re: Avro and Hadoop streaming
Scott Carey 2011-06-15, 16:53
Hadoop has an old version of Avro in it. You must place the 1.6.0 jar (and relevant dependencies, or the avro-tools.jar with all dependencies bundled) in a location that gets picked up first in the task classpath. Packaging it in the job jar works. I'm not sure if putting it in the distributed cache and loading it as a library that way would. On 6/15/11 9:30 AM, "Matt Pouttu-Clarke" <[EMAIL PROTECTED]> wrote: >You have to package it in the job jar file under a /lib directory. > > >On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote: > >> Still didn't work. >> >> I'm pretty new to hadoop world, I probably need to place the avro jar >> somewhere on the classpath of the nodes, >> however I have no idea how to do that. >> >> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>> Miki, >>> >>> You'll need to provide the entire canonical class name >>> (org.apache.avro.mapredS). >>> >>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> >>>wrote: >>>> Greetings, >>>> >>>> I've tried to run a job with the following command: >>>> >>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>>> -input /in/avro \ >>>> -output $out \ >>>> -mapper avro-mapper.py \ >>>> -reducer avro-reducer.py \ >>>> -file avro-mapper.py \ >>>> -file avro-reducer.py \ >>>> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>>> -inputformat AvroAsTextInputFormat >>>> >>>> However I get >>>> -inputformat : class not found : AvroAsTextInputFormat >>>> >>>> I'm probably missing something obvious to do. >>>> >>>> Any ideas? >>>> >>>> Thanks! >>>> -- >>>> Miki >>>> >>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> >>>>wrote: >>>>> Miki, >>>>> >>>>> Have you looked at AvroAsTextInputFormat? >>>>> >>>>> >>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av>>>>>roAsT >>>>> extInputFormat.html >>>>> >>>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>>> >>>>> https://issues.apache.org/jira/browse/AVRO-830>>>>> >>>>> Are these perhaps what you're looking for? >>>>> >>>>> Doug >>>>> >>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>>> Greetings, >>>>>> >>>>>> I'd like to use hadoop streaming with Avro files. >>>>>> My plan is to write an inputformat class that emits json records, >>>>>>one >>>>>> per line. This way the streaming application can read one record per >>>>>> line. >>>>>> >>>>>>( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi>>>>>>ng+Ot >>>>>> her+Plugins+for+Jobs) >>>>>> >>>>>> I couldn't find any documentation/help about writing inputformat >>>>>> classes. Can someone point me to the right direction? >>>>>> >>>>>> Thanks, >>>>>> -- >>>>>> Miki >>>>> >>>> >>> >>> >>> >>> -- >>> Harsh J >>> > > >iCrossing Privileged and Confidential Information >This email message is for the sole use of the intended recipient(s) and >may contain confidential and privileged information of iCrossing. Any >unauthorized review, use, disclosure or distribution is prohibited. If >you are not the intended recipient, please contact the sender by reply >email and destroy all copies of the original message. > >
-
Re: Avro and Hadoop streaming
Miki Tebeka 2011-06-15, 17:36
Found the magic (-files and -libs): jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \ -files $jars \ -libjars $jars \ -input /in/avro \ -output /out/avro \ -mapper avro-mapper.py \ -reducer avro-reducer.py \ -file avro-mapper.py \ -file avro-reducer.py \ -inputformat org.apache.avro.mapred.AvroAsTextInputFormat Thanks for all the help! On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[EMAIL PROTECTED]> wrote: > Hadoop has an old version of Avro in it. You must place the 1.6.0 jar > (and relevant dependencies, or the avro-tools.jar with all dependencies > bundled) in a location that gets picked up first in the task classpath. > > Packaging it in the job jar works. I'm not sure if putting it in the > distributed cache and loading it as a library that way would. > > On 6/15/11 9:30 AM, "Matt Pouttu-Clarke" > <[EMAIL PROTECTED]> wrote: > >>You have to package it in the job jar file under a /lib directory. >> >> >>On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote: >> >>> Still didn't work. >>> >>> I'm pretty new to hadoop world, I probably need to place the avro jar >>> somewhere on the classpath of the nodes, >>> however I have no idea how to do that. >>> >>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>>> Miki, >>>> >>>> You'll need to provide the entire canonical class name >>>> (org.apache.avro.mapredS). >>>> >>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> >>>>wrote: >>>>> Greetings, >>>>> >>>>> I've tried to run a job with the following command: >>>>> >>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>>>> -input /in/avro \ >>>>> -output $out \ >>>>> -mapper avro-mapper.py \ >>>>> -reducer avro-reducer.py \ >>>>> -file avro-mapper.py \ >>>>> -file avro-reducer.py \ >>>>> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>>>> -inputformat AvroAsTextInputFormat >>>>> >>>>> However I get >>>>> -inputformat : class not found : AvroAsTextInputFormat >>>>> >>>>> I'm probably missing something obvious to do. >>>>> >>>>> Any ideas? >>>>> >>>>> Thanks! >>>>> -- >>>>> Miki >>>>> >>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> >>>>>wrote: >>>>>> Miki, >>>>>> >>>>>> Have you looked at AvroAsTextInputFormat? >>>>>> >>>>>> >>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av>>>>>>roAsT >>>>>> extInputFormat.html >>>>>> >>>>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>>>> >>>>>> https://issues.apache.org/jira/browse/AVRO-830>>>>>> >>>>>> Are these perhaps what you're looking for? >>>>>> >>>>>> Doug >>>>>> >>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> I'd like to use hadoop streaming with Avro files. >>>>>>> My plan is to write an inputformat class that emits json records, >>>>>>>one >>>>>>> per line. This way the streaming application can read one record per >>>>>>> line. >>>>>>> >>>>>>>( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi>>>>>>>ng+Ot >>>>>>> her+Plugins+for+Jobs) >>>>>>> >>>>>>> I couldn't find any documentation/help about writing inputformat >>>>>>> classes. Can someone point me to the right direction? >>>>>>> >>>>>>> Thanks, >>>>>>> -- >>>>>>> Miki >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> >> >> >>iCrossing Privileged and Confidential Information >>This email message is for the sole use of the intended recipient(s) and >>may contain confidential and privileged information of iCrossing. Any >>unauthorized review, use, disclosure or distribution is prohibited. If >>you are not the intended recipient, please contact the sender by reply >>email and destroy all copies of the original message. >> >> > >
-
Re: Avro and Hadoop streaming
Mona Gandhi 2011-07-12, 00:36
I tried using the command that Miki posted, with the difference being the version of Avro (1.5.1 instead of 1.6.0). I cant seem to get it to work. /home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -files avro-1.5.1.jar,avro-mapred-1.5.1.jar -libjars avro-1.5.1.jar,avro-mapred-1.5.1.jar -mapper test-mapper.py -reducer test-reducer.py -jobconf mapred.job.name=AvroTestJob --numReduceTasks 3 -file test-mapper.py -file test-reducer.py -inputformat org.apache.avro.mapred.AvroAsTextInputFormat -input avroevents -output AvroOutput Error: -inputformat : class not found : org.apache.avro.mapred.AvroAsTextInputFormat Streaming Job Failed! Thanks for all the help! On Jun 15, 2011, at 10:36 AM, Miki Tebeka wrote: > Found the magic (-files and -libs): > > jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar > > hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \ > -files $jars \ > -libjars $jars \ > -input /in/avro \ > -output /out/avro \ > -mapper avro-mapper.py \ > -reducer avro-reducer.py \ > -file avro-mapper.py \ > -file avro-reducer.py \ > -inputformat org.apache.avro.mapred.AvroAsTextInputFormat > > Thanks for all the help! > > On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[EMAIL PROTECTED]> wrote: >> Hadoop has an old version of Avro in it. You must place the 1.6.0 jar >> (and relevant dependencies, or the avro-tools.jar with all dependencies >> bundled) in a location that gets picked up first in the task classpath. >> >> Packaging it in the job jar works. I'm not sure if putting it in the >> distributed cache and loading it as a library that way would. >> >> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke" >> <[EMAIL PROTECTED]> wrote: >> >>> You have to package it in the job jar file under a /lib directory. >>> >>> >>> On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote: >>> >>>> Still didn't work. >>>> >>>> I'm pretty new to hadoop world, I probably need to place the avro jar >>>> somewhere on the classpath of the nodes, >>>> however I have no idea how to do that. >>>> >>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>>>> Miki, >>>>> >>>>> You'll need to provide the entire canonical class name >>>>> (org.apache.avro.mapredS). >>>>> >>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> Greetings, >>>>>> >>>>>> I've tried to run a job with the following command: >>>>>> >>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>>>>> -input /in/avro \ >>>>>> -output $out \ >>>>>> -mapper avro-mapper.py \ >>>>>> -reducer avro-reducer.py \ >>>>>> -file avro-mapper.py \ >>>>>> -file avro-reducer.py \ >>>>>> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>>>>> -inputformat AvroAsTextInputFormat >>>>>> >>>>>> However I get >>>>>> -inputformat : class not found : AvroAsTextInputFormat >>>>>> >>>>>> I'm probably missing something obvious to do. >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> Thanks! >>>>>> -- >>>>>> Miki >>>>>> >>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>>> Miki, >>>>>>> >>>>>>> Have you looked at AvroAsTextInputFormat? >>>>>>> >>>>>>> >>>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av>>>>>>> roAsT >>>>>>> extInputFormat.html >>>>>>> >>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/AVRO-830>>>>>>> >>>>>>> Are these perhaps what you're looking for? >>>>>>> >>>>>>> Doug >>>>>>> >>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> I'd like to use hadoop streaming with Avro files. >>>>>>>> My plan is to write an inputformat class that emits json records, >>>>>>>> one >>>>>>>> per line. This way the streaming application can read one record per >>>>>>>> line. >>>>>>>> >>>>>>>> ( http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi
-
Re: Avro and Hadoop streaming
Miki Tebeka 2011-10-03, 23:21
I *think* streaming support was added only in 1.6 On Mon, Jul 11, 2011 at 5:36 PM, Mona Gandhi <[EMAIL PROTECTED]> wrote: > I tried using the command that Miki posted, with the difference being the version of Avro (1.5.1 instead of 1.6.0). I cant seem to get it to work. > > /home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -files avro-1.5.1.jar,avro-mapred-1.5.1.jar -libjars avro-1.5.1.jar,avro-mapred-1.5.1.jar -mapper test-mapper.py -reducer test-reducer.py -jobconf mapred.job.name=AvroTestJob --numReduceTasks 3 -file test-mapper.py -file test-reducer.py -inputformat org.apache.avro.mapred.AvroAsTextInputFormat -input avroevents -output AvroOutput > > > Error: -inputformat : class not found : org.apache.avro.mapred.AvroAsTextInputFormat > Streaming Job Failed! > > > Thanks for all the help! > > On Jun 15, 2011, at 10:36 AM, Miki Tebeka wrote: > >> Found the magic (-files and -libs): >> >> jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar >> >> hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \ >> -files $jars \ >> -libjars $jars \ >> -input /in/avro \ >> -output /out/avro \ >> -mapper avro-mapper.py \ >> -reducer avro-reducer.py \ >> -file avro-mapper.py \ >> -file avro-reducer.py \ >> -inputformat org.apache.avro.mapred.AvroAsTextInputFormat >> >> Thanks for all the help! >> >> On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[EMAIL PROTECTED]> wrote: >>> Hadoop has an old version of Avro in it. You must place the 1.6.0 jar >>> (and relevant dependencies, or the avro-tools.jar with all dependencies >>> bundled) in a location that gets picked up first in the task classpath. >>> >>> Packaging it in the job jar works. I'm not sure if putting it in the >>> distributed cache and loading it as a library that way would. >>> >>> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke" >>> <[EMAIL PROTECTED]> wrote: >>> >>>> You have to package it in the job jar file under a /lib directory. >>>> >>>> >>>> On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote: >>>> >>>>> Still didn't work. >>>>> >>>>> I'm pretty new to hadoop world, I probably need to place the avro jar >>>>> somewhere on the classpath of the nodes, >>>>> however I have no idea how to do that. >>>>> >>>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>>>>> Miki, >>>>>> >>>>>> You'll need to provide the entire canonical class name >>>>>> (org.apache.avro.mapredS). >>>>>> >>>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> I've tried to run a job with the following command: >>>>>>> >>>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>>>>>> -input /in/avro \ >>>>>>> -output $out \ >>>>>>> -mapper avro-mapper.py \ >>>>>>> -reducer avro-reducer.py \ >>>>>>> -file avro-mapper.py \ >>>>>>> -file avro-reducer.py \ >>>>>>> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>>>>>> -inputformat AvroAsTextInputFormat >>>>>>> >>>>>>> However I get >>>>>>> -inputformat : class not found : AvroAsTextInputFormat >>>>>>> >>>>>>> I'm probably missing something obvious to do. >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> Thanks! >>>>>>> -- >>>>>>> Miki >>>>>>> >>>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> >>>>>>> wrote: >>>>>>>> Miki, >>>>>>>> >>>>>>>> Have you looked at AvroAsTextInputFormat? >>>>>>>> >>>>>>>> >>>>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av>>>>>>>> roAsT >>>>>>>> extInputFormat.html >>>>>>>> >>>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/AVRO-830>>>>>>>> >>>>>>>> Are these perhaps what you're looking for? >>>>>>>> >>>>>>>> Doug >>>>>>>> >>>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>>>>>> Greetings, >>>>>>>>> >>>>>>>>> I'd like to use hadoop streaming with Avro files. >>>>>>>>> My plan is to write an inputformat class that emits json records,
|
|