Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Documenting Guidance on compression and codecs


Copy link to this message
-
Re: Documenting Guidance on compression and codecs
Ted Yu 2013-09-11, 20:19
w.r.t. Data Block Encoding, you can find some data here:

https://issues.apache.org/jira/browse/HBASE-4218?focusedCommentId=13123337&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13123337
On Wed, Sep 11, 2013 at 12:10 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> Do we have a consolidated resource with information and recommendations
> about use of the above? For instance, I ran a simple test using
> PerformanceEvaluation, examining just the size of data on disk for 1G of
> input data. The matrix below has some surprising results:
>
> +--------------------+--------------+
> | MODIFIER           | SIZE (bytes) |
> +--------------------+--------------+
> | none               |   1108553612 |
> +--------------------+--------------+
> | compression:SNAPPY |    427335534 |
> +--------------------+--------------+
> | compression:LZO    |    270422088 |
> +--------------------+--------------+
> | compression:GZ     |    152899297 |
> +--------------------+--------------+
> | codec:PREFIX       |   1993910969 |
> +--------------------+--------------+
> | codec:DIFF         |   1960970083 |
> +--------------------+--------------+
> | codec:FAST_DIFF    |   1061374722 |
> +--------------------+--------------+
> | codec:PREFIX_TREE  |   1066586604 |
> +--------------------+--------------+
>
> Where does a wayward soul look for guidance on which combination of the
> above to choose for their application?
>
> Thanks,
> Nick
>