Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # dev >> 3x faster python reader


+
Uri Laserson 2013-04-29, 05:24
+
Doug Cutting 2013-04-29, 16:22
+
Philip Zeyliger 2013-04-29, 17:56
+
Uri Laserson 2013-04-29, 18:55
Copy link to this message
-
Re: 3x faster python reader
Hi,

I did the same for fastavro <https://bitbucket.org/tebeka/fastavro>. I
found changing the current code while keeping the same API very hard.

Another option we can take is leave the current code as version 1 add the
new code either as new module under avro or as avro2.

All the best,
--
Miki
On Sun, Apr 28, 2013 at 10:24 PM, Uri Laserson <[EMAIL PROTECTED]>wrote:

> Hi all,
>
> I rewrote some of the python code to read avro files.  I was able to
> achieve a ~3x speedup over the current impl, and can probably do better if
> it was cleaned up more.  The main changes are:
> * Eliminated the object-oriented nature of the reader.  It's just functions
> now.  Presumably this can be changed back, but it didn't really seem like
> there was any reason for it.
> * Given a reader and writer schema, it precomputes as much helpful info as
> it can upfront and caches this in a dictionary that the read functions use
> * The code is compiled with Cython for speedup.
>
> How can this be used to improve the current python api?  Let me know how I
> can be helpful...
>
> Uri
>
> --
> Uri Laserson, PhD
> Data Scientist, Cloudera
> Twitter/GitHub: @laserson
> +1 617 910 0447
> [EMAIL PROTECTED]
>
+
Russell Jurney 2013-04-30, 06:10
+
Uri Laserson 2013-04-30, 07:50