|
felix gao
2011-01-28, 20:04
Harsh J
2011-01-28, 20:22
felix gao
2011-01-28, 20:29
Anand Padmanaban
2011-01-28, 21:02
Harsh J
2011-01-28, 21:31
Philip Zeyliger
2011-01-28, 21:44
felix gao
2011-01-28, 21:53
Ron Bodkin
2011-01-28, 23:43
|
-
How to get started with examples on avrofelix gao 2011-01-28, 20:04
Hi all,
I am trying to convert a lot of our existing logs into avro format in hadoop. I am not sure if there are any examples to follow. Thanks, Felix
-
Re: How to get started with examples on avroHarsh J 2011-01-28, 20:22
Based on the language you're targeting, have a look at its test-cases
available on the in the project's version control: http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out via SVN, or via Git mirrors] Another good resource on the ends of Avro (Data and RPC) is by phunt at http://github.com/phunt/avro-rpc-quickstart#readme I had written a python data-file centric snippet for Avro a while ago at my blog; it may help if you're looking to get started with Python (although it does not cover all aspects, which the functions in the available test cases for lang/python do): http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote: > Hi all, > I am trying to convert a lot of our existing logs into avro format in > hadoop. I am not sure if there are any examples to follow. > Thanks, > Felix -- Harsh J www.harshj.com
-
Re: How to get started with examples on avrofelix gao 2011-01-28, 20:29
Thanks for the quick reply. I am interested in doing this through the java
implementation and I would like to do it in parallel that utilizes the mapreduce framework. On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Based on the language you're targeting, have a look at its test-cases > available on the in the project's version control: > http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out > via SVN, or via Git mirrors] > > Another good resource on the ends of Avro (Data and RPC) is by phunt > at http://github.com/phunt/avro-rpc-quickstart#readme > > I had written a python data-file centric snippet for Avro a while ago > at my blog; it may help if you're looking to get started with Python > (although it does not cover all aspects, which the functions in the > available test cases for lang/python do): > > http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ > > On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote: > > Hi all, > > I am trying to convert a lot of our existing logs into avro format in > > hadoop. I am not sure if there are any examples to follow. > > Thanks, > > Felix > > > > -- > Harsh J > www.harshj.com >
-
RE: How to get started with examples on avroAnand Padmanaban 2011-01-28, 21:02
Meta question. I see avro is the means, what is the end goal? What do you want to do with the data after converting it to avro?
> -----Original Message----- > From: felix gao [mailto:[EMAIL PROTECTED]] > Sent: Friday, January 28, 2011 12:30 PM > To: [EMAIL PROTECTED] > Subject: Re: How to get started with examples on avro > > Thanks for the quick reply. I am interested in doing this through the java implementation and I would like to do it in parallel that > utilizes the mapreduce framework. > > > On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> wrote: > > > Based on the language you're targeting, have a look at its test-cases > available on the in the project's version control: > http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out > via SVN, or via Git mirrors] > > Another good resource on the ends of Avro (Data and RPC) is by phunt > at http://github.com/phunt/avro-rpc-quickstart#readme > > I had written a python data-file centric snippet for Avro a while ago > at my blog; it may help if you're looking to get started with Python > (although it does not cover all aspects, which the functions in the > available test cases for lang/python do): > http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ > > > On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote: > > Hi all, > > I am trying to convert a lot of our existing logs into avro format in > > hadoop. I am not sure if there are any examples to follow. > > Thanks, > > Felix > > > > > -- > Harsh J > www.harshj.com > >
-
Re: How to get started with examples on avroHarsh J 2011-01-28, 21:31
On Sat, Jan 29, 2011 at 1:59 AM, felix gao <[EMAIL PROTECTED]> wrote:
> Thanks for the quick reply. I am interested in doing this through the java > implementation and I would like to do it in parallel that utilizes the > mapreduce framework. That operation is pretty similar to writing a normal output data file. You can use the MapReduce API of Avro (that provides an Input/Output Format class to use, given a Schema) to do so, or write your own custom record writing classes that do it by converting your input format's record representation to Avro serialized records and writing those out to an open DataFile for a given schema. Alternatively, you can also write avro serialized data bytes into SequenceFiles. I believe the Hadoop MapReduce trunk may have some good code on Avro serialization classes and uses of that in MapReduce. > On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Based on the language you're targeting, have a look at its test-cases >> available on the in the project's version control: >> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out >> via SVN, or via Git mirrors] >> >> Another good resource on the ends of Avro (Data and RPC) is by phunt >> at http://github.com/phunt/avro-rpc-quickstart#readme >> >> I had written a python data-file centric snippet for Avro a while ago >> at my blog; it may help if you're looking to get started with Python >> (although it does not cover all aspects, which the functions in the >> available test cases for lang/python do): >> >> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ >> >> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote: >> > Hi all, >> > I am trying to convert a lot of our existing logs into avro format in >> > hadoop. I am not sure if there are any examples to follow. >> > Thanks, >> > Felix >> >> >> >> -- >> Harsh J >> www.harshj.com > > -- Harsh J www.harshj.com
-
Re: How to get started with examples on avroPhilip Zeyliger 2011-01-28, 21:44
Felix,
After you've figured out how to work it for your application, I do encourage you to contribute (https://cwiki.apache.org/AVRO/how-to-contribute.html) examples to the open source project. We'll find a place for them! -- Philip On Fri, Jan 28, 2011 at 12:29 PM, felix gao <[EMAIL PROTECTED]> wrote: > Thanks for the quick reply. I am interested in doing this through the java > implementation and I would like to do it in parallel that utilizes the > mapreduce framework. > > > On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Based on the language you're targeting, have a look at its test-cases >> available on the in the project's version control: >> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out >> via SVN, or via Git mirrors] >> >> Another good resource on the ends of Avro (Data and RPC) is by phunt >> at http://github.com/phunt/avro-rpc-quickstart#readme >> >> I had written a python data-file centric snippet for Avro a while ago >> at my blog; it may help if you're looking to get started with Python >> (although it does not cover all aspects, which the functions in the >> available test cases for lang/python do): >> >> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ >> >> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote: >> > Hi all, >> > I am trying to convert a lot of our existing logs into avro format in >> > hadoop. I am not sure if there are any examples to follow. >> > Thanks, >> > Felix >> >> >> >> -- >> Harsh J >> www.harshj.com >> > >
-
Re: How to get started with examples on avrofelix gao 2011-01-28, 21:53
The goal to convert to avro is for us to use the splittable property so we
can have some type of compressed data for huge log files so we can save some hdfs disk spaces. On Fri, Jan 28, 2011 at 1:02 PM, Anand Padmanaban <[EMAIL PROTECTED]>wrote: > Meta question. I see avro is the means, what is the end goal? What do you > want to do with the data after converting it to avro? > > > -----Original Message----- > > From: felix gao [mailto:[EMAIL PROTECTED]] > > Sent: Friday, January 28, 2011 12:30 PM > > To: [EMAIL PROTECTED] > > Subject: Re: How to get started with examples on avro > > > > Thanks for the quick reply. I am interested in doing this through the > java implementation and I would like to do it in parallel that > > utilizes the mapreduce framework. > > > > > > On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> > wrote: > > > > > > Based on the language you're targeting, have a look at its > test-cases > > available on the in the project's version control: > > http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it > out > > via SVN, or via Git mirrors] > > > > Another good resource on the ends of Avro (Data and RPC) is by > phunt > > at http://github.com/phunt/avro-rpc-quickstart#readme > > > > I had written a python data-file centric snippet for Avro a while > ago > > at my blog; it may help if you're looking to get started with > Python > > (although it does not cover all aspects, which the functions in the > > available test cases for lang/python do): > > > http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ > > > > > > On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> > wrote: > > > Hi all, > > > I am trying to convert a lot of our existing logs into avro > format in > > > hadoop. I am not sure if there are any examples to follow. > > > Thanks, > > > Felix > > > > > > > > > > -- > > Harsh J > > www.harshj.com > > > > > >
-
Re: How to get started with examples on avroRon Bodkin 2011-01-28, 23:43
The Colossal Pipe (https://github.com/ThinkBigAnalytics/colossal-pipe)
framework also supports working with Avro as its native format for Java map-reduce, but it also lets you read in JSON or text files as input to mappers, making it fairly easy to use for this kind of conversion job. E.g., the heart of the program would be just this: ColFile inlogs = ColFile.at("/dfs/logs/json/"+hr /*2011/01/28/03*/).of(LogFormat.class).jsonFormat(); ColFile outlogs = ColFile.at("/dfs/logs/avro/"+hr).of(Log.class); ColPhase copy = new ColPhase().reads(inlogs).writes(outlogs).map(IdentityMapper.class). groupBy("timestamp").reduce(IdentityReducer.class); ColPipe conversion = new ColPipe(getClass()).named("log conversion"); Conversion.produces(outlogs); You'd currently define an identity mapper and reducer (soon it will default to those): public static class IdentitMapper extends BaseMapper<Log, Log> { @Override public void map(Log in, Log out, ColContext<Log> context) { super.map(in, out, context); } } public static class IdentityReducer extends BaseReducer<Log, Log> { @Override public void reduce(Iterable<Log> in, Log out, ColContext<Log> context) { super.reduce(in, out, context); } } Ron Ron Bodkin CEO Think Big Analytics m: +1 (415) 509-2895 From: Philip Zeyliger <[EMAIL PROTECTED]> Reply-To: <[EMAIL PROTECTED]> Date: Fri, 28 Jan 2011 13:44:42 -0800 To: <[EMAIL PROTECTED]> Subject: Re: How to get started with examples on avro Felix, After you've figured out how to work it for your application, I do encourage you to contribute (https://cwiki.apache.org/AVRO/how-to-contribute.html) examples to the open source project. We'll find a place for them! -- Philip On Fri, Jan 28, 2011 at 12:29 PM, felix gao <[EMAIL PROTECTED]> wrote: > Thanks for the quick reply. I am interested in doing this through the java > implementation and I would like to do it in parallel that utilizes the > mapreduce framework. > > > On Fri, Jan 28, 2011 at 12:22 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> Based on the language you're targeting, have a look at its test-cases >> available on the in the project's version control: >> http://svn.apache.org/repos/asf/avro/trunk/lang/ [You can check it out >> via SVN, or via Git mirrors] >> >> Another good resource on the ends of Avro (Data and RPC) is by phunt >> at http://github.com/phunt/avro-rpc-quickstart#readme >> >> I had written a python data-file centric snippet for Avro a while ago >> at my blog; it may help if you're looking to get started with Python >> (although it does not cover all aspects, which the functions in the >> available test cases for lang/python do): >> http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-py >> thon/ >> >> On Sat, Jan 29, 2011 at 1:34 AM, felix gao <[EMAIL PROTECTED]> wrote: >>> > Hi all, >>> > I am trying to convert a lot of our existing logs into avro format in >>> > hadoop. I am not sure if there are any examples to follow. >>> > Thanks, >>> > Felix >> >> >> >> -- >> Harsh J >> www.harshj.com <http://www.harshj.com> > |