Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Avro and Hadoop streaming


+
Miki Tebeka 2011-06-02, 21:30
+
Doug Cutting 2011-06-03, 08:43
+
Tatu Saloranta 2011-06-03, 16:18
Copy link to this message
-
Re: Avro and Hadoop streaming
Greetings,

I've tried to run a job with the following command:

hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
    -input /in/avro \
    -output $out \
    -mapper avro-mapper.py \
    -reducer avro-reducer.py \
    -file avro-mapper.py \
    -file avro-reducer.py \
    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
    -inputformat AvroAsTextInputFormat

However I get
-inputformat : class not found : AvroAsTextInputFormat

I'm probably missing something obvious to do.

Any ideas?

Thanks!
--
Miki

On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Miki,
>
> Have you looked at AvroAsTextInputFormat?
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html
>
> Also, release 1.5.2 will include AvroTextOutputFormat:
>
> https://issues.apache.org/jira/browse/AVRO-830
>
> Are these perhaps what you're looking for?
>
> Doug
>
> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>> Greetings,
>>
>> I'd like to use hadoop streaming with Avro files.
>> My plan is to write an inputformat class that emits json records, one
>> per line. This way the streaming application can read one record per
>> line.
>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs)
>>
>> I couldn't find any documentation/help about writing inputformat
>> classes. Can someone point me to the right direction?
>>
>> Thanks,
>> --
>> Miki
>
+
Harsh J 2011-06-15, 10:33
+
Miki Tebeka 2011-06-15, 16:26
+
Matt Pouttu-Clarke 2011-06-15, 16:30
+
Scott Carey 2011-06-15, 16:53
+
Miki Tebeka 2011-06-15, 17:36
+
Mona Gandhi 2011-07-12, 00:36
+
Miki Tebeka 2011-10-03, 23:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB