Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro Python package slowness


Copy link to this message
-
Re: Avro Python package slowness
>> I'm using the avro python package (1.5.0), and it is slow.
> Does the schema have unions?  Last I checked, python recursively
> validates data in order to determine which branch of a union should be
> written.  In the worst case (nested unions) this can lead to quadratic
> serialization times.
There are many unions, but not nested ones.

>  It should be possible to determine the union
> branch to write much more efficiently.
Can you elaborate on how? I'll try to code this and patch.
Also, I'm talking about reading the avro file, not writing to it.

All the best,
--
Miki
[I don't suffer from insanity, I enjoy every minute of it]