Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Avro and Hadoop streaming


+
Miki Tebeka 2011-06-02, 21:30
+
Doug Cutting 2011-06-03, 08:43
+
Tatu Saloranta 2011-06-03, 16:18
+
Miki Tebeka 2011-06-15, 00:01
+
Harsh J 2011-06-15, 10:33
+
Miki Tebeka 2011-06-15, 16:26
+
Matt Pouttu-Clarke 2011-06-15, 16:30
+
Scott Carey 2011-06-15, 16:53
+
Miki Tebeka 2011-06-15, 17:36
+
Mona Gandhi 2011-07-12, 00:36
Copy link to this message
-
Re: Avro and Hadoop streaming
I *think* streaming support was added only in 1.6

On Mon, Jul 11, 2011 at 5:36 PM, Mona Gandhi <[EMAIL PROTECTED]> wrote:
> I tried using the command that Miki posted, with the difference being the version of Avro (1.5.1 instead of 1.6.0). I cant seem to get it to work.
>
> /home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -files avro-1.5.1.jar,avro-mapred-1.5.1.jar -libjars avro-1.5.1.jar,avro-mapred-1.5.1.jar -mapper test-mapper.py -reducer test-reducer.py -jobconf mapred.job.name=AvroTestJob --numReduceTasks 3 -file test-mapper.py -file test-reducer.py  -inputformat org.apache.avro.mapred.AvroAsTextInputFormat -input avroevents -output AvroOutput
>
>
> Error: -inputformat : class not found : org.apache.avro.mapred.AvroAsTextInputFormat
> Streaming Job Failed!
>
>
> Thanks for all the help!
>
> On Jun 15, 2011, at 10:36 AM, Miki Tebeka wrote:
>
>> Found the magic (-files and -libs):
>>
>> jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar
>>
>> hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \
>>    -files $jars \
>>    -libjars $jars \
>>    -input /in/avro \
>>    -output /out/avro \
>>    -mapper avro-mapper.py \
>>    -reducer avro-reducer.py \
>>    -file avro-mapper.py \
>>    -file avro-reducer.py \
>>    -inputformat org.apache.avro.mapred.AvroAsTextInputFormat
>>
>> Thanks for all the help!
>>
>> On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
>>> Hadoop has an old version of Avro in it.  You must place the 1.6.0 jar
>>> (and relevant dependencies, or the avro-tools.jar with all dependencies
>>> bundled) in a location that gets picked up first in the task classpath.
>>>
>>> Packaging it in the job jar works. I'm not sure if putting it in the
>>> distributed cache and loading it as a library that way would.
>>>
>>> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke"
>>> <[EMAIL PROTECTED]> wrote:
>>>
>>>> You have to package it in the job jar file under a /lib directory.
>>>>
>>>>
>>>> On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Still didn't work.
>>>>>
>>>>> I'm pretty new to hadoop world, I probably need to place the avro jar
>>>>> somewhere on the classpath of the nodes,
>>>>> however I have no idea how to do that.
>>>>>
>>>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>>> Miki,
>>>>>>
>>>>>> You'll need to provide the entire canonical class name
>>>>>> (org.apache.avro.mapredS).
>>>>>>
>>>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I've tried to run a job with the following command:
>>>>>>>
>>>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>>>>>>>    -input /in/avro \
>>>>>>>    -output $out \
>>>>>>>    -mapper avro-mapper.py \
>>>>>>>    -reducer avro-reducer.py \
>>>>>>>    -file avro-mapper.py \
>>>>>>>    -file avro-reducer.py \
>>>>>>>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>>>>>>>    -inputformat AvroAsTextInputFormat
>>>>>>>
>>>>>>> However I get
>>>>>>> -inputformat : class not found : AvroAsTextInputFormat
>>>>>>>
>>>>>>> I'm probably missing something obvious to do.
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> --
>>>>>>> Miki
>>>>>>>
>>>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]>
>>>>>>> wrote:
>>>>>>>> Miki,
>>>>>>>>
>>>>>>>> Have you looked at AvroAsTextInputFormat?
>>>>>>>>
>>>>>>>>
>>>>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av
>>>>>>>> roAsT
>>>>>>>> extInputFormat.html
>>>>>>>>
>>>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/AVRO-830
>>>>>>>>
>>>>>>>> Are these perhaps what you're looking for?
>>>>>>>>
>>>>>>>> Doug
>>>>>>>>
>>>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I'd like to use hadoop streaming with Avro files.
>>>>>>>>> My plan is to write an inputformat class that emits json records,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB