On Sat, Jan 29, 2011 at 1:59 AM, felix gao <[EMAIL PROTECTED]> wrote:
> Thanks for the quick reply. I am interested in doing this through the java
> implementation and I would like to do it in parallel that utilizes the
> mapreduce framework.
That operation is pretty similar to writing a normal output data file.
You can use the MapReduce API of Avro (that provides an Input/Output
Format class to use, given a Schema) to do so, or write your own
custom record writing classes that do it by converting your input
format's record representation to Avro serialized records and writing
those out to an open DataFile for a given schema. Alternatively, you
can also write avro serialized data bytes into SequenceFiles.
I believe the Hadoop MapReduce trunk may have some good code on Avro
serialization classes and uses of that in MapReduce.
> On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Based on the language you're targeting, have a look at its test-cases
>> available on the in the project's version control:
>> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out
>> via SVN, or via Git mirrors]
>> Another good resource on the ends of Avro (Data and RPC) is by phunt
>> at http://github.com/phunt/avro-rpc-quickstart#readme
>> I had written a python data-file centric snippet for Avro a while ago
>> at my blog; it may help if you're looking to get started with Python
>> (although it does not cover all aspects, which the functions in the
>> available test cases for lang/python do):
>> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote:
>> > Hi all,
>> > I am trying to convert a lot of our existing logs into avro format in
>> > hadoop. I am not sure if there are any examples to follow.
>> > Thanks,
>> > Felix
>> Harsh J