Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Avro and Hadoop streaming


Copy link to this message
-
Re: Avro and Hadoop streaming
Harsh J 2011-06-15, 10:33
Miki,

You'll need to provide the entire canonical class name
(org.apache.avro.mapred…).

On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]> wrote:
> Greetings,
>
> I've tried to run a job with the following command:
>
> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>    -input /in/avro \
>    -output $out \
>    -mapper avro-mapper.py \
>    -reducer avro-reducer.py \
>    -file avro-mapper.py \
>    -file avro-reducer.py \
>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>    -inputformat AvroAsTextInputFormat
>
> However I get
> -inputformat : class not found : AvroAsTextInputFormat
>
> I'm probably missing something obvious to do.
>
> Any ideas?
>
> Thanks!
> --
> Miki
>
> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>> Miki,
>>
>> Have you looked at AvroAsTextInputFormat?
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html
>>
>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>
>> https://issues.apache.org/jira/browse/AVRO-830
>>
>> Are these perhaps what you're looking for?
>>
>> Doug
>>
>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>> Greetings,
>>>
>>> I'd like to use hadoop streaming with Avro files.
>>> My plan is to write an inputformat class that emits json records, one
>>> per line. This way the streaming application can read one record per
>>> line.
>>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs)
>>>
>>> I couldn't find any documentation/help about writing inputformat
>>> classes. Can someone point me to the right direction?
>>>
>>> Thanks,
>>> --
>>> Miki
>>
>

--
Harsh J