Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> prefix compression

Copy link to this message
Re: prefix compression
Jason - are you feeding it that whole string for each date?  Input data is
17 bytes per record * 50mm records = 850MB, and that reduces to 984 bytes?
 Is it possible to compress by that much?  Maybe I'm missing something about
how the FST works.

On Fri, Jun 3, 2011 at 8:51 PM, Jason Rutherglen <[EMAIL PROTECTED]
> wrote:

> Also the next thing to measure with the FST is the key lookup speed.
> I'm not sure what that'd look like, or how to compare with HBase right
> now?
> On Fri, Jun 3, 2011 at 8:42 PM, Jason Rutherglen
> <[EMAIL PROTECTED]> wrote:
> > Here's a nice preliminary number with the FST, 50 million dates of the
> > form yyyyMMddHHmmssSSS, with each incremented by one millisecond.  The
> > FST is 984 bytes, with an incrementing long to point to the presumably
> > MMap'd value data.  This's a bit crazy.
> >
> > Perhaps we should try other increments as well?  Given that HBase keys
> > especially are probably close increments of each other, I think the
> > FST can always be loaded into RAM with pointers out to the actual
> > values.
> >