Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Getting started with Avro + Reading from an Avro formatted file


Copy link to this message
-
Re: Getting started with Avro + Reading from an Avro formatted file
Selvi,

Expanding on Douglas' response, if you have installed Avro's python
libraries (Simplest way to get latest stable is: "easy_install avro",
or install from the distribution -- Post back if you need help on
this), you can simply do, using the now-installed 'avro' executable:

$ ls
sample_input.avro

$ avro cat sample_input.avro --format csv
011990-99999,0,-619524000000
011990-99999,22,-619506000000
011990-99999,-11,-619484400000
012650-99999,111,-655531200000
012650-99999,78,-655509600000

Or, write to a resultant file, as you would regularly in a shell:

$ avro cat sample_input.avro --format csv > sample_input.csv

For more options on avro's cat and write opts:

$ avro --help

On Tue, Jan 24, 2012 at 9:01 PM, selvi k <[EMAIL PROTECTED]> wrote:
> Hello All,
>
>
> I would like some suggestions on where I can start in the Avro project.
>
>
> I want to be able to read from an Avro formatted log file (specifically the
> History Log file created at the end of a Hadoop job) and create a Comma
> Separated file of certain log entries. I need a csv file because this is the
> format that is accepted by post processing software I am working with (eg:
> Matlab).
>
>
> Initially I was using a BASH script to grep and awk from this file and
> create my CSV file because I needed a very few values from it, and a quick
> script just worked. I didn't try to get to know what format the log file was
> in and utilize that. (my bad!)  Now that I need to be scaling up and want to
> have a reliable way to parse, I would like to try and do it the right way.
>
>
> My question is this: For the above goal, could you please guide me with
> steps I can follow - such as reading material and libraries I could try to
> use. As I go through the Quick Start Guide and FAQ, I see that a lot of the
> information here is geared to someone who wants to use the data
> serialization and RPC functionality provided by Avro. Given that I only want
> to be able to "read", where may I start?
>
>
> I can comfortably script with BASH and Perl. Given that I only see support
> for Java, Python and Ruby, I think I can take this as as opportunity to
> learn Python and get up to speed.
>
>
> Thanks a lot.
>
>
> -Selvi
>
>

--
Harsh J
Customer Ops. Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB