Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Documenting Guidance on compression and codecs


Copy link to this message
-
Re: Documenting Guidance on compression and codecs
lz4 is at least 2x faster  than Snappy with comparable compression.

BLOCK_ENCODING make sense only if Keys are ~ Values (time-series type of
data) as since it compresses only keys.
On Wed, Sep 11, 2013 at 1:22 PM, Elliott Clark <[EMAIL PROTECTED]> wrote:

> To make things even more interesting I've been testing lz4 recently
> and it's been doing very well on my ycsb runs.  So there's another
> option to add.
>
> On Wed, Sep 11, 2013 at 12:10 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
> > Do we have a consolidated resource with information and recommendations
> > about use of the above? For instance, I ran a simple test using
> > PerformanceEvaluation, examining just the size of data on disk for 1G of
> > input data. The matrix below has some surprising results:
> >
> > +--------------------+--------------+
> > | MODIFIER           | SIZE (bytes) |
> > +--------------------+--------------+
> > | none               |   1108553612 |
> > +--------------------+--------------+
> > | compression:SNAPPY |    427335534 |
> > +--------------------+--------------+
> > | compression:LZO    |    270422088 |
> > +--------------------+--------------+
> > | compression:GZ     |    152899297 |
> > +--------------------+--------------+
> > | codec:PREFIX       |   1993910969 |
> > +--------------------+--------------+
> > | codec:DIFF         |   1960970083 |
> > +--------------------+--------------+
> > | codec:FAST_DIFF    |   1061374722 |
> > +--------------------+--------------+
> > | codec:PREFIX_TREE  |   1066586604 |
> > +--------------------+--------------+
> >
> > Where does a wayward soul look for guidance on which combination of the
> > above to choose for their application?
> >
> > Thanks,
> > Nick
>