Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Avro Python package slowness


+
Miki Tebeka 2011-05-06, 17:34
+
Scott Carey 2011-05-06, 17:40
+
Miki Tebeka 2011-05-06, 18:18
+
Doug Cutting 2011-05-06, 18:28
+
Miki Tebeka 2011-05-06, 19:31
+
Doug Cutting 2011-05-06, 21:06
+
Miki Tebeka 2011-05-06, 21:13
+
Doug Cutting 2011-05-06, 17:58
Copy link to this message
-
Re: Avro Python package slowness
Miki Tebeka 2011-05-06, 23:20
>> I'm using the avro python package (1.5.0), and it is slow.
> Does the schema have unions?  Last I checked, python recursively
> validates data in order to determine which branch of a union should be
> written.  In the worst case (nested unions) this can lead to quadratic
> serialization times.
There are many unions, but not nested ones.

>  It should be possible to determine the union
> branch to write much more efficiently.
Can you elaborate on how? I'll try to code this and patch.
Also, I'm talking about reading the avro file, not writing to it.

All the best,
--
Miki
[I don't suffer from insanity, I enjoy every minute of it]
+
Doug Cutting 2011-05-11, 10:16