Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Getting started with Avro + Reading from an Avro formatted file


+
selvi k 2012-01-24, 15:31
+
Douglas Creager 2012-01-24, 15:54
Copy link to this message
-
Re: Getting started with Avro + Reading from an Avro formatted file
Selvi,

Expanding on Douglas' response, if you have installed Avro's python
libraries (Simplest way to get latest stable is: "easy_install avro",
or install from the distribution -- Post back if you need help on
this), you can simply do, using the now-installed 'avro' executable:

$ ls
sample_input.avro

$ avro cat sample_input.avro --format csv
011990-99999,0,-619524000000
011990-99999,22,-619506000000
011990-99999,-11,-619484400000
012650-99999,111,-655531200000
012650-99999,78,-655509600000

Or, write to a resultant file, as you would regularly in a shell:

$ avro cat sample_input.avro --format csv > sample_input.csv

For more options on avro's cat and write opts:

$ avro --help

On Tue, Jan 24, 2012 at 9:01 PM, selvi k <[EMAIL PROTECTED]> wrote:
> Hello All,
>
>
> I would like some suggestions on where I can start in the Avro project.
>
>
> I want to be able to read from an Avro formatted log file (specifically the
> History Log file created at the end of a Hadoop job) and create a Comma
> Separated file of certain log entries. I need a csv file because this is the
> format that is accepted by post processing software I am working with (eg:
> Matlab).
>
>
> Initially I was using a BASH script to grep and awk from this file and
> create my CSV file because I needed a very few values from it, and a quick
> script just worked. I didn't try to get to know what format the log file was
> in and utilize that. (my bad!)  Now that I need to be scaling up and want to
> have a reliable way to parse, I would like to try and do it the right way.
>
>
> My question is this: For the above goal, could you please guide me with
> steps I can follow - such as reading material and libraries I could try to
> use. As I go through the Quick Start Guide and FAQ, I see that a lot of the
> information here is geared to someone who wants to use the data
> serialization and RPC functionality provided by Avro. Given that I only want
> to be able to "read", where may I start?
>
>
> I can comfortably script with BASH and Perl. Given that I only see support
> for Java, Python and Ruby, I think I can take this as as opportunity to
> learn Python and get up to speed.
>
>
> Thanks a lot.
>
>
> -Selvi
>
>

--
Harsh J
Customer Ops. Engineer, Cloudera
+
selvi k 2012-01-24, 19:37
+
selvi k 2012-01-24, 20:20
+
Harsh J 2012-01-24, 20:44
+
selvi k 2012-01-25, 02:46
+
Douglas Creager 2012-01-24, 21:00
+
selvi k 2012-01-25, 02:50
+
Harsh J 2012-01-24, 21:06
+
selvi k 2012-01-25, 02:56