|
|
-
Avro Python package slowness
Miki Tebeka 2011-05-06, 17:34
Greetings,
I'm using the avro python package (1.5.0), and it is slow. It takes about 1min to process 33K records file. For comparison the Java packages process the same file in 1sec.
Any ideas on how to speed that up?
All the best, -- Miki
-
Re: Avro Python package slowness
Scott Carey 2011-05-06, 17:40
Can you try the 1.5.1 release candidate? http://people.apache.org/~cutting/avro-1.5.1-rc0/It should be faster than 1.5.0, but its very unlikely to match Java. On 5/6/11 10:34 AM, "Miki Tebeka" <[EMAIL PROTECTED]> wrote: >Greetings, > >I'm using the avro python package (1.5.0), and it is slow. >It takes about 1min to process 33K records file. For comparison the >Java packages process the same file in 1sec. > >Any ideas on how to speed that up? > >All the best, >-- >Miki
-
Re: Avro Python package slowness
Doug Cutting 2011-05-06, 17:58
On 05/06/2011 10:34 AM, Miki Tebeka wrote: > I'm using the avro python package (1.5.0), and it is slow. > It takes about 1min to process 33K records file. For comparison the > Java packages process the same file in 1sec. > > Any ideas on how to speed that up?
Does the schema have unions? Last I checked, python recursively validates data in order to determine which branch of a union should be written. In the worst case (nested unions) this can lead to quadratic serialization times. It should be possible to determine the union branch to write much more efficiently.
It would be great to have some performance benchmarks for Python, as we do for Java.
Doug
-
Re: Avro Python package slowness
Miki Tebeka 2011-05-06, 18:18
Greetings, >>I'm using the avro python package (1.5.0), and it is slow. > Can you try the 1.5.1 release candidate? > http://people.apache.org/~cutting/avro-1.5.1-rc0/This trimmed it down to 30sec, nice! BTW: It'll be nice to have a __version__ in avro/__init__.py All the best, -- Miki [I don't suffer from insanity, I enjoy every minute of it]
-
Re: Avro Python package slowness
Doug Cutting 2011-05-06, 18:28
On 05/06/2011 11:18 AM, Miki Tebeka wrote: > BTW: It'll be nice to have a __version__ in avro/__init__.py Please file an issue in Jira and submit a patch, if you are able. https://issues.apache.org/jira/browse/AVROThanks, Doug
-
Re: Avro Python package slowness
Miki Tebeka 2011-05-06, 19:31
Greetings, >> BTW: It'll be nice to have a __version__ in avro/__init__.py > Please file an issue in Jira and submit a patch, if you are able. Done - https://issues.apache.org/jira/browse/AVRO-817BTW: When is 1.5.1 coming out? All the best, -- Miki [I don't suffer from insanity, I enjoy every minute of it]
-
Re: Avro Python package slowness
Doug Cutting 2011-05-06, 21:06
On 05/06/2011 12:31 PM, Miki Tebeka wrote: > BTW: When is 1.5.1 coming out?
It's out today!
Doug
-
Re: Avro Python package slowness
Miki Tebeka 2011-05-06, 21:13
Greetings,
>> BTW: When is 1.5.1 coming out? > It's out today! Great, thanks!
All the best, -- Miki [I don't suffer from insanity, I enjoy every minute of it]
-
Re: Avro Python package slowness
Miki Tebeka 2011-05-06, 23:20
>> I'm using the avro python package (1.5.0), and it is slow. > Does the schema have unions? Last I checked, python recursively > validates data in order to determine which branch of a union should be > written. In the worst case (nested unions) this can lead to quadratic > serialization times. There are many unions, but not nested ones.
> It should be possible to determine the union > branch to write much more efficiently. Can you elaborate on how? I'll try to code this and patch. Also, I'm talking about reading the avro file, not writing to it.
All the best, -- Miki [I don't suffer from insanity, I enjoy every minute of it]
-
Re: Avro Python package slowness
Doug Cutting 2011-05-11, 10:16
On 05/06/2011 04:20 PM, Miki Tebeka wrote: >> It should be possible to determine the union >> branch to write much more efficiently. > Can you elaborate on how? I'll try to code this and patch. > Also, I'm talking about reading the avro file, not writing to it.
The optimization I was speaking of is for writing, not reading.
Doug
|
|