Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # dev - 3x faster python reader


Copy link to this message
-
3x faster python reader
Uri Laserson 2013-04-29, 05:24
Hi all,

I rewrote some of the python code to read avro files.  I was able to
achieve a ~3x speedup over the current impl, and can probably do better if
it was cleaned up more.  The main changes are:
* Eliminated the object-oriented nature of the reader.  It's just functions
now.  Presumably this can be changed back, but it didn't really seem like
there was any reason for it.
* Given a reader and writer schema, it precomputes as much helpful info as
it can upfront and caches this in a dictionary that the read functions use
* The code is compiled with Cython for speedup.

How can this be used to improve the current python api?  Let me know how I
can be helpful...

Uri

--
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
[EMAIL PROTECTED]
+
Doug Cutting 2013-04-29, 16:22
+
Philip Zeyliger 2013-04-29, 17:56
+
Uri Laserson 2013-04-29, 18:55
+
Miki Tebeka 2013-04-29, 21:32
+
Russell Jurney 2013-04-30, 06:10
+
Uri Laserson 2013-04-30, 07:50