Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Steam lib


Copy link to this message
-
Re: Steam lib
Ted -

Any chance we can add your quantile estimator to stream-lib?

Matt

On Wed, Nov 13, 2013 at 5:38 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> I also have a new quantile estimator that dominates all other
> implementations that I know of on speed and accuracy (10us per point added,
> 8K data size to get a few ppm accuracy for high or low quantiles and about
> 0.05% accuracy on middle quantiles like the median).
>
>
>
>
> On Wed, Nov 13, 2013 at 8:53 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>
>> Summingbird uses algebird. I think Stripe might also have a library, Avi
>> Bryant was toying with this for a while.
>>
>> Algebird has some nice features like not doing approximation at all for
>> small sets (just use the real values), etc. we also recently did a bunch of
>> work to make sure we can serialize all approximate structures so they can
>> be correctly reused by different computations, sent across the wire, etc.
>>
>> I don't recall doing speed comparisons and the like, it would be
>> interesting to see them if you guys are choosing what library to use.
>>
>> On Nov 13, 2013, at 12:33 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>
>> > stream-lib is used quite widely and is generally high quality.
>> >
>> > The other competitive library is Brick House from Klout.
>> >
>> >
>> http://engineering.klout.com/2013/01/introducing-brickhouse-major-open-source-release-from-klout/
>> >
>> >
>> >
>> >
>> > On Tue, Nov 12, 2013 at 7:28 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:
>> >
>> >> Just saw this library today and thought it's something we can
>> potentially
>> >> leverage:
>> >>
>> >> https://github.com/addthis/stream-lib
>> >>
>> >> It has a number of algo for approximation streams and has code for
>> >> cardinality estimation (HyperLogLog) and others.
>> >>
>> >> Looks like Twitter's SummingBird uses this library too.
>> >>
>> >> Tim
>> >>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB