Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Getting started with Avro + Reading from an Avro formatted file


+
selvi k 2012-01-24, 15:31
+
Douglas Creager 2012-01-24, 15:54
+
Harsh J 2012-01-24, 16:01
+
selvi k 2012-01-24, 19:37
+
selvi k 2012-01-24, 20:20
Copy link to this message
-
Re: Getting started with Avro + Reading from an Avro formatted file
If you want to try out the Python API for Avro datafiles, I had
written a short blog post on reading/writing that at
http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/
which still holds good I think. Hope this helps.

On Wed, Jan 25, 2012 at 1:50 AM, selvi k <[EMAIL PROTECTED]> wrote:
> I found out what the issue was:
> I first needed to install snappy downloaded from here:
> http://code.google.com/p/snappy/
>
> After a simple ./configure, make and make install, 'easy_install avro'
> completed successfully.
>
> I will try out both the CSV conversion options and update this thread in a
> bit.
>
> -Selvi
>
>
>
> On Tue, Jan 24, 2012 at 2:37 PM, selvi k <[EMAIL PROTECTED]> wrote:
>>
>> Douglas and Harsh - Thanks a lot for the immediate and detailed replies!
>> Looks like both of these would work well for me.
>>
>>
>> In order to start trying these, I have tried a few things to get started
>> with Avro, but this is where I am stuck:
>>
>>
>> 1. I first downloaded the stable version in the form of
>> "avro-1.6.1.tar.gz". (I am working out all this on a Ubuntu 10.04 machine).
>>
>> I don't find a readme file and am not familar with installing a python
>> package, so I am not sure if what I am doing is correct. After some basic
>> googling, I did:
>>
>> avro-1.6.1$ ./setup.py build
>>
>> This appears to complete successfully. Then when I do this:
>>
>> ...avro-1.6.1$ sudo ./setup.py install
>>
>> I get an error message. (pasted at the end of this mail [1])
>>
>>
>> 2. I tried the technique suggested by Harsh, but it ends with a similar
>> error as pasted below in [2]
>>
>> /avro$ sudo easy_install avro
>>
>> Then I tried to install snappy by itself:
>>
>> /avro$ sudo easy_install python-snappy
>>
>> I get the same error.
>>
>> Also I read that that this might help with this type of error, so I tried:
>>
>> avro$ sudo apt-get install python2.6-dev
>>
>> I ensured I have gcc and installed g++ too (because I wasn't sure what was
>> needed).
>>
>> I did see a similar error message reported here for Avro and OS X:
>> https://issues.apache.org/jira/browse/AVRO-981
>>
>> Before installing g++ and python-dev, the error message I was seeing from
>> easy_install python_snappy was different and shorter (attached below) [3].
>>
>>
>>
>>
>> Sorry if I should just be reading up on general Python development or
>> packages or installs (and/or other things), before I should even be
>> attempting to do this.  I'll be doing that now to move further.  But in case
>> anyone might have suggestions for the errors I am seeing, that would be
>> great.
>>
>>
>> I did find this Quick Start Guide from the main Avro wiki page, but when I
>> look through the Python example it is once again focussed client/server and
>> RPC communication between them:
>>
>> https://github.com/phunt/avro-rpc-quickstart
>>
>>
>> Also my understanding is that I must 'install' or deploy Avro before I can
>> try out the C bindings suggested by Douglas. I am stating this since I am
>> not exactly clear by what this meant: -  "especially since the C bindings
>> don't have any library dependencies to install". I am assuming it means, I
>> don't need anything beyond a basic install of Avro.
>>
>>
>>
>> 3. With regards to the two suggested ways, would either of these
>> techniques allow me to filter my data records using some sort of a condition
>> on a field?(or a few fields)  If not it seems like I would have to resort to
>> first grepping the log file with the condition I want, and then using either
>> of these two techniques to convert to CSV file. This would still be much
>> better than what I am doing now, which is through not-so-pretty awk
>> invocations to retrieve the fields I need (after the initial grep). But if
>> the existing API, allows me to scan through the log file and specify
>> conditions for fields, it might be much more efficient. I can imagine that I
>> might have to use the low-level API and write a program to do this, but I am

Harsh J
Customer Ops. Engineer, Cloudera
+
selvi k 2012-01-25, 02:46
+
Douglas Creager 2012-01-24, 21:00
+
selvi k 2012-01-25, 02:50
+
Harsh J 2012-01-24, 21:06
+
selvi k 2012-01-25, 02:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB