Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # dev - 3x faster python reader


+
Uri Laserson 2013-04-29, 05:24
Copy link to this message
-
Re: 3x faster python reader
Doug Cutting 2013-04-29, 16:22
Uri,

This sounds awesome!  Is the API compatible with the existing API?  If
it's incompatible and cannot easily be made compatible then perhaps we
can add it as the 'new' API and deprecate the old one.  Regardless,
please file an issue in Jira (issues.apache.org/jira/browse/AVRO) and
attach your patch there.

Thanks,

Doug

On Sun, Apr 28, 2013 at 10:24 PM, Uri Laserson <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I rewrote some of the python code to read avro files.  I was able to
> achieve a ~3x speedup over the current impl, and can probably do better if
> it was cleaned up more.  The main changes are:
> * Eliminated the object-oriented nature of the reader.  It's just functions
> now.  Presumably this can be changed back, but it didn't really seem like
> there was any reason for it.
> * Given a reader and writer schema, it precomputes as much helpful info as
> it can upfront and caches this in a dictionary that the read functions use
> * The code is compiled with Cython for speedup.
>
> How can this be used to improve the current python api?  Let me know how I
> can be helpful...
>
> Uri
>
> --
> Uri Laserson, PhD
> Data Scientist, Cloudera
> Twitter/GitHub: @laserson
> +1 617 910 0447
> [EMAIL PROTECTED]
+
Philip Zeyliger 2013-04-29, 17:56
+
Uri Laserson 2013-04-29, 18:55
+
Miki Tebeka 2013-04-29, 21:32
+
Russell Jurney 2013-04-30, 06:10
+
Uri Laserson 2013-04-30, 07:50