Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Steam lib


Copy link to this message
-
Re: Steam lib
Great!  I have not used Q digest in production yet but I believe
Eugene, the author of stream-lib's Q digest implementation, has.
Eugene, can you comment on how it performs in practice?

Matt
On Wed, Nov 13, 2013 at 4:45 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>
> As soon as it is for sure done.  I have one more significant improvement to make so that it works on sequential values.  I will hand the code to suneel who will be packaging it for mahout. You can def have it at the same time.
>
> I would love a review from you guys when I am ready. The theory doc is nearly to that point.  Would you like I start there?  Also, can I get some info from you about how q digests work in practice?
>
> Sent from my iPhone
>
> On Nov 13, 2013, at 20:46, Matt Abrams <[EMAIL PROTECTED]> wrote:
>
>> Ted -
>>
>> Any chance we can add your quantile estimator to stream-lib?
>>
>> Matt
>>
>> On Wed, Nov 13, 2013 at 5:38 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>> I also have a new quantile estimator that dominates all other
>>> implementations that I know of on speed and accuracy (10us per point added,
>>> 8K data size to get a few ppm accuracy for high or low quantiles and about
>>> 0.05% accuracy on middle quantiles like the median).
>>>
>>>
>>>
>>>
>>> On Wed, Nov 13, 2013 at 8:53 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>>>
>>>> Summingbird uses algebird. I think Stripe might also have a library, Avi
>>>> Bryant was toying with this for a while.
>>>>
>>>> Algebird has some nice features like not doing approximation at all for
>>>> small sets (just use the real values), etc. we also recently did a bunch of
>>>> work to make sure we can serialize all approximate structures so they can
>>>> be correctly reused by different computations, sent across the wire, etc.
>>>>
>>>> I don't recall doing speed comparisons and the like, it would be
>>>> interesting to see them if you guys are choosing what library to use.
>>>>
>>>> On Nov 13, 2013, at 12:33 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> stream-lib is used quite widely and is generally high quality.
>>>>>
>>>>> The other competitive library is Brick House from Klout.
>>>> http://engineering.klout.com/2013/01/introducing-brickhouse-major-open-source-release-from-klout/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 12, 2013 at 7:28 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Just saw this library today and thought it's something we can
>>>> potentially
>>>>>> leverage:
>>>>>>
>>>>>> https://github.com/addthis/stream-lib
>>>>>>
>>>>>> It has a number of algo for approximation streams and has code for
>>>>>> cardinality estimation (HyperLogLog) and others.
>>>>>>
>>>>>> Looks like Twitter's SummingBird uses this library too.
>>>>>>
>>>>>> Tim
>>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB