Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Documenting Guidance on compression and codecs


Copy link to this message
-
Re: Documenting Guidance on compression and codecs
Elliott Clark 2013-09-11, 20:22
To make things even more interesting I've been testing lz4 recently
and it's been doing very well on my ycsb runs.  So there's another
option to add.

On Wed, Sep 11, 2013 at 12:10 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
> Do we have a consolidated resource with information and recommendations
> about use of the above? For instance, I ran a simple test using
> PerformanceEvaluation, examining just the size of data on disk for 1G of
> input data. The matrix below has some surprising results:
>
> +--------------------+--------------+
> | MODIFIER           | SIZE (bytes) |
> +--------------------+--------------+
> | none               |   1108553612 |
> +--------------------+--------------+
> | compression:SNAPPY |    427335534 |
> +--------------------+--------------+
> | compression:LZO    |    270422088 |
> +--------------------+--------------+
> | compression:GZ     |    152899297 |
> +--------------------+--------------+
> | codec:PREFIX       |   1993910969 |
> +--------------------+--------------+
> | codec:DIFF         |   1960970083 |
> +--------------------+--------------+
> | codec:FAST_DIFF    |   1061374722 |
> +--------------------+--------------+
> | codec:PREFIX_TREE  |   1066586604 |
> +--------------------+--------------+
>
> Where does a wayward soul look for guidance on which combination of the
> above to choose for their application?
>
> Thanks,
> Nick