Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Avro and Hadoop streaming


+
Miki Tebeka 2011-06-02, 21:30
+
Doug Cutting 2011-06-03, 08:43
+
Tatu Saloranta 2011-06-03, 16:18
+
Miki Tebeka 2011-06-15, 00:01
+
Harsh J 2011-06-15, 10:33
+
Miki Tebeka 2011-06-15, 16:26
+
Matt Pouttu-Clarke 2011-06-15, 16:30
+
Scott Carey 2011-06-15, 16:53
Copy link to this message
-
Re: Avro and Hadoop streaming
Miki Tebeka 2011-06-15, 17:36
Found the magic (-files and -libs):

jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar

hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \
    -files $jars \
    -libjars $jars \
    -input /in/avro \
    -output /out/avro \
    -mapper avro-mapper.py \
    -reducer avro-reducer.py \
    -file avro-mapper.py \
    -file avro-reducer.py \
    -inputformat org.apache.avro.mapred.AvroAsTextInputFormat

Thanks for all the help!

On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
> Hadoop has an old version of Avro in it.  You must place the 1.6.0 jar
> (and relevant dependencies, or the avro-tools.jar with all dependencies
> bundled) in a location that gets picked up first in the task classpath.
>
> Packaging it in the job jar works. I'm not sure if putting it in the
> distributed cache and loading it as a library that way would.
>
> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke"
> <[EMAIL PROTECTED]> wrote:
>
>>You have to package it in the job jar file under a /lib directory.
>>
>>
>>On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote:
>>
>>> Still didn't work.
>>>
>>> I'm pretty new to hadoop world, I probably need to place the avro jar
>>> somewhere on the classpath of the nodes,
>>> however I have no idea how to do that.
>>>
>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>> Miki,
>>>>
>>>> You'll need to provide the entire canonical class name
>>>> (org.apache.avro.mapredS).
>>>>
>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]>
>>>>wrote:
>>>>> Greetings,
>>>>>
>>>>> I've tried to run a job with the following command:
>>>>>
>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>>>>>    -input /in/avro \
>>>>>    -output $out \
>>>>>    -mapper avro-mapper.py \
>>>>>    -reducer avro-reducer.py \
>>>>>    -file avro-mapper.py \
>>>>>    -file avro-reducer.py \
>>>>>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>>>>>    -inputformat AvroAsTextInputFormat
>>>>>
>>>>> However I get
>>>>> -inputformat : class not found : AvroAsTextInputFormat
>>>>>
>>>>> I'm probably missing something obvious to do.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thanks!
>>>>> --
>>>>> Miki
>>>>>
>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]>
>>>>>wrote:
>>>>>> Miki,
>>>>>>
>>>>>> Have you looked at AvroAsTextInputFormat?
>>>>>>
>>>>>>
>>>>>>http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av
>>>>>>roAsT
>>>>>> extInputFormat.html
>>>>>>
>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/AVRO-830
>>>>>>
>>>>>> Are these perhaps what you're looking for?
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I'd like to use hadoop streaming with Avro files.
>>>>>>> My plan is to write an inputformat class that emits json records,
>>>>>>>one
>>>>>>> per line. This way the streaming application can read one record per
>>>>>>> line.
>>>>>>>
>>>>>>>(http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi
>>>>>>>ng+Ot
>>>>>>> her+Plugins+for+Jobs)
>>>>>>>
>>>>>>> I couldn't find any documentation/help about writing inputformat
>>>>>>> classes. Can someone point me to the right direction?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Miki
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>
>>
>>iCrossing Privileged and Confidential Information
>>This email message is for the sole use of the intended recipient(s) and
>>may contain confidential and privileged information of iCrossing. Any
>>unauthorized review, use, disclosure or distribution is prohibited. If
>>you are not the intended recipient, please contact the sender by reply
>>email and destroy all copies of the original message.
>>
>>
>
>
+
Mona Gandhi 2011-07-12, 00:36
+
Miki Tebeka 2011-10-03, 23:21