Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> How to get started with examples on avro


+
felix gao 2011-01-28, 20:04
+
Harsh J 2011-01-28, 20:22
+
felix gao 2011-01-28, 20:29
+
Anand Padmanaban 2011-01-28, 21:02
+
felix gao 2011-01-28, 21:53
Copy link to this message
-
Re: How to get started with examples on avro
On Sat, Jan 29, 2011 at 1:59 AM, felix gao <[EMAIL PROTECTED]> wrote:
> Thanks for the quick reply.  I am interested in doing this through the java
> implementation and I would like to do it in parallel that utilizes the
> mapreduce framework.

That operation is pretty similar to writing a normal output data file.

You can use the MapReduce API of Avro (that provides an Input/Output
Format class to use, given a Schema) to do so, or write your own
custom record writing classes that do it by converting your input
format's record representation to Avro serialized records and writing
those out to an open DataFile for a given schema. Alternatively, you
can also write avro serialized data bytes into SequenceFiles.

I believe the Hadoop MapReduce trunk may have some good code on Avro
serialization classes and uses of that in MapReduce.

> On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Based on the language you're targeting, have a look at its test-cases
>> available on the in the project's version control:
>> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out
>> via SVN, or via Git mirrors]
>>
>> Another good resource on the ends of Avro (Data and RPC) is by phunt
>> at http://github.com/phunt/avro-rpc-quickstart#readme
>>
>> I had written a python data-file centric snippet for Avro a while ago
>> at my blog; it may help if you're looking to get started with Python
>> (although it does not cover all aspects, which the functions in the
>> available test cases for lang/python do):
>>
>> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/
>>
>> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote:
>> > Hi all,
>> > I am trying to convert a lot of our existing logs into avro format in
>> > hadoop.  I am not sure if there are any examples to follow.
>> > Thanks,
>> > Felix
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>
>

--
Harsh J
www.harshj.com
+
Philip Zeyliger 2011-01-28, 21:44
+
Ron Bodkin 2011-01-28, 23:43
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB