Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro and Hadoop streaming


Copy link to this message
-
Re: Avro and Hadoop streaming
Found the magic (-files and -libs):

jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar

hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \
    -files $jars \
    -libjars $jars \
    -input /in/avro \
    -output /out/avro \
    -mapper avro-mapper.py \
    -reducer avro-reducer.py \
    -file avro-mapper.py \
    -file avro-reducer.py \
    -inputformat org.apache.avro.mapred.AvroAsTextInputFormat

Thanks for all the help!

On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
> Hadoop has an old version of Avro in it.  You must place the 1.6.0 jar
> (and relevant dependencies, or the avro-tools.jar with all dependencies
> bundled) in a location that gets picked up first in the task classpath.
>
> Packaging it in the job jar works. I'm not sure if putting it in the
> distributed cache and loading it as a library that way would.
>
> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke"
> <[EMAIL PROTECTED]> wrote:
>
>>You have to package it in the job jar file under a /lib directory.
>>
>>
>>On 6/15/11 9:26 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote:
>>
>>> Still didn't work.
>>>
>>> I'm pretty new to hadoop world, I probably need to place the avro jar
>>> somewhere on the classpath of the nodes,
>>> however I have no idea how to do that.
>>>
>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>> Miki,
>>>>
>>>> You'll need to provide the entire canonical class name
>>>> (org.apache.avro.mapredS).
>>>>
>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[EMAIL PROTECTED]>
>>>>wrote:
>>>>> Greetings,
>>>>>
>>>>> I've tried to run a job with the following command:
>>>>>
>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>>>>>    -input /in/avro \
>>>>>    -output $out \
>>>>>    -mapper avro-mapper.py \
>>>>>    -reducer avro-reducer.py \
>>>>>    -file avro-mapper.py \
>>>>>    -file avro-reducer.py \
>>>>>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>>>>>    -inputformat AvroAsTextInputFormat
>>>>>
>>>>> However I get
>>>>> -inputformat : class not found : AvroAsTextInputFormat
>>>>>
>>>>> I'm probably missing something obvious to do.
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thanks!
>>>>> --
>>>>> Miki
>>>>>
>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[EMAIL PROTECTED]>
>>>>>wrote:
>>>>>> Miki,
>>>>>>
>>>>>> Have you looked at AvroAsTextInputFormat?
>>>>>>
>>>>>>
>>>>>>http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av
>>>>>>roAsT
>>>>>> extInputFormat.html
>>>>>>
>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/AVRO-830
>>>>>>
>>>>>> Are these perhaps what you're looking for?
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I'd like to use hadoop streaming with Avro files.
>>>>>>> My plan is to write an inputformat class that emits json records,
>>>>>>>one
>>>>>>> per line. This way the streaming application can read one record per
>>>>>>> line.
>>>>>>>
>>>>>>>(http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi
>>>>>>>ng+Ot
>>>>>>> her+Plugins+for+Jobs)
>>>>>>>
>>>>>>> I couldn't find any documentation/help about writing inputformat
>>>>>>> classes. Can someone point me to the right direction?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> --
>>>>>>> Miki
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>
>>
>>iCrossing Privileged and Confidential Information
>>This email message is for the sole use of the intended recipient(s) and
>>may contain confidential and privileged information of iCrossing. Any
>>unauthorized review, use, disclosure or distribution is prohibited. If
>>you are not the intended recipient, please contact the sender by reply
>>email and destroy all copies of the original message.
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB