|
|
-
Re: Getting started with Avro + Reading from an Avro formatted fileHarsh J 2012-01-24, 20:44
If you want to try out the Python API for Avro datafiles, I had
written a short blog post on reading/writing that at http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/ which still holds good I think. Hope this helps. On Wed, Jan 25, 2012 at 1:50 AM, selvi k <[EMAIL PROTECTED]> wrote: > I found out what the issue was: > I first needed to install snappy downloaded from here: > http://code.google.com/p/snappy/ > > After a simple ./configure, make and make install, 'easy_install avro' > completed successfully. > > I will try out both the CSV conversion options and update this thread in a > bit. > > -Selvi > > > > On Tue, Jan 24, 2012 at 2:37 PM, selvi k <[EMAIL PROTECTED]> wrote: >> >> Douglas and Harsh - Thanks a lot for the immediate and detailed replies! >> Looks like both of these would work well for me. >> >> >> In order to start trying these, I have tried a few things to get started >> with Avro, but this is where I am stuck: >> >> >> 1. I first downloaded the stable version in the form of >> "avro-1.6.1.tar.gz". (I am working out all this on a Ubuntu 10.04 machine). >> >> I don't find a readme file and am not familar with installing a python >> package, so I am not sure if what I am doing is correct. After some basic >> googling, I did: >> >> avro-1.6.1$ ./setup.py build >> >> This appears to complete successfully. Then when I do this: >> >> ...avro-1.6.1$ sudo ./setup.py install >> >> I get an error message. (pasted at the end of this mail [1]) >> >> >> 2. I tried the technique suggested by Harsh, but it ends with a similar >> error as pasted below in [2] >> >> /avro$ sudo easy_install avro >> >> Then I tried to install snappy by itself: >> >> /avro$ sudo easy_install python-snappy >> >> I get the same error. >> >> Also I read that that this might help with this type of error, so I tried: >> >> avro$ sudo apt-get install python2.6-dev >> >> I ensured I have gcc and installed g++ too (because I wasn't sure what was >> needed). >> >> I did see a similar error message reported here for Avro and OS X: >> https://issues.apache.org/jira/browse/AVRO-981 >> >> Before installing g++ and python-dev, the error message I was seeing from >> easy_install python_snappy was different and shorter (attached below) [3]. >> >> >> >> >> Sorry if I should just be reading up on general Python development or >> packages or installs (and/or other things), before I should even be >> attempting to do this. I'll be doing that now to move further. But in case >> anyone might have suggestions for the errors I am seeing, that would be >> great. >> >> >> I did find this Quick Start Guide from the main Avro wiki page, but when I >> look through the Python example it is once again focussed client/server and >> RPC communication between them: >> >> https://github.com/phunt/avro-rpc-quickstart >> >> >> Also my understanding is that I must 'install' or deploy Avro before I can >> try out the C bindings suggested by Douglas. I am stating this since I am >> not exactly clear by what this meant: - "especially since the C bindings >> don't have any library dependencies to install". I am assuming it means, I >> don't need anything beyond a basic install of Avro. >> >> >> >> 3. With regards to the two suggested ways, would either of these >> techniques allow me to filter my data records using some sort of a condition >> on a field?(or a few fields) If not it seems like I would have to resort to >> first grepping the log file with the condition I want, and then using either >> of these two techniques to convert to CSV file. This would still be much >> better than what I am doing now, which is through not-so-pretty awk >> invocations to retrieve the fields I need (after the initial grep). But if >> the existing API, allows me to scan through the log file and specify >> conditions for fields, it might be much more efficient. I can imagine that I >> might have to use the low-level API and write a program to do this, but I am Harsh J Customer Ops. Engineer, Cloudera |