Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> prefix compression


Copy link to this message
-
Re: prefix compression
Jason - are you feeding it that whole string for each date?  Input data is
17 bytes per record * 50mm records = 850MB, and that reduces to 984 bytes?
 Is it possible to compress by that much?  Maybe I'm missing something about
how the FST works.

Matt
On Fri, Jun 3, 2011 at 8:51 PM, Jason Rutherglen <[EMAIL PROTECTED]
> wrote:

> Also the next thing to measure with the FST is the key lookup speed.
> I'm not sure what that'd look like, or how to compare with HBase right
> now?
>
> On Fri, Jun 3, 2011 at 8:42 PM, Jason Rutherglen
> <[EMAIL PROTECTED]> wrote:
> > Here's a nice preliminary number with the FST, 50 million dates of the
> > form yyyyMMddHHmmssSSS, with each incremented by one millisecond.  The
> > FST is 984 bytes, with an incrementing long to point to the presumably
> > MMap'd value data.  This's a bit crazy.
> >
> > Perhaps we should try other increments as well?  Given that HBase keys
> > especially are probably close increments of each other, I think the
> > FST can always be loaded into RAM with pointers out to the actual
> > values.
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB