Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Steam lib


Copy link to this message
-
Re: Steam lib
Matt Abrams 2013-11-14, 14:17
Great!  I have not used Q digest in production yet but I believe
Eugene, the author of stream-lib's Q digest implementation, has.
Eugene, can you comment on how it performs in practice?

Matt
On Wed, Nov 13, 2013 at 4:45 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>
> As soon as it is for sure done.  I have one more significant improvement to make so that it works on sequential values.  I will hand the code to suneel who will be packaging it for mahout. You can def have it at the same time.
>
> I would love a review from you guys when I am ready. The theory doc is nearly to that point.  Would you like I start there?  Also, can I get some info from you about how q digests work in practice?
>
> Sent from my iPhone
>
> On Nov 13, 2013, at 20:46, Matt Abrams <[EMAIL PROTECTED]> wrote:
>
>> Ted -
>>
>> Any chance we can add your quantile estimator to stream-lib?
>>
>> Matt
>>
>> On Wed, Nov 13, 2013 at 5:38 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>> I also have a new quantile estimator that dominates all other
>>> implementations that I know of on speed and accuracy (10us per point added,
>>> 8K data size to get a few ppm accuracy for high or low quantiles and about
>>> 0.05% accuracy on middle quantiles like the median).
>>>
>>>
>>>
>>>
>>> On Wed, Nov 13, 2013 at 8:53 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>>>
>>>> Summingbird uses algebird. I think Stripe might also have a library, Avi
>>>> Bryant was toying with this for a while.
>>>>
>>>> Algebird has some nice features like not doing approximation at all for
>>>> small sets (just use the real values), etc. we also recently did a bunch of
>>>> work to make sure we can serialize all approximate structures so they can
>>>> be correctly reused by different computations, sent across the wire, etc.
>>>>
>>>> I don't recall doing speed comparisons and the like, it would be
>>>> interesting to see them if you guys are choosing what library to use.
>>>>
>>>> On Nov 13, 2013, at 12:33 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> stream-lib is used quite widely and is generally high quality.
>>>>>
>>>>> The other competitive library is Brick House from Klout.
>>>> http://engineering.klout.com/2013/01/introducing-brickhouse-major-open-source-release-from-klout/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 12, 2013 at 7:28 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Just saw this library today and thought it's something we can
>>>> potentially
>>>>>> leverage:
>>>>>>
>>>>>> https://github.com/addthis/stream-lib
>>>>>>
>>>>>> It has a number of algo for approximation streams and has code for
>>>>>> cardinality estimation (HyperLogLog) and others.
>>>>>>
>>>>>> Looks like Twitter's SummingBird uses this library too.
>>>>>>
>>>>>> Tim
>>>>