Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Documenting Guidance on compression and codecs


Copy link to this message
-
Documenting Guidance on compression and codecs
Do we have a consolidated resource with information and recommendations
about use of the above? For instance, I ran a simple test using
PerformanceEvaluation, examining just the size of data on disk for 1G of
input data. The matrix below has some surprising results:

+--------------------+--------------+
| MODIFIER           | SIZE (bytes) |
+--------------------+--------------+
| none               |   1108553612 |
+--------------------+--------------+
| compression:SNAPPY |    427335534 |
+--------------------+--------------+
| compression:LZO    |    270422088 |
+--------------------+--------------+
| compression:GZ     |    152899297 |
+--------------------+--------------+
| codec:PREFIX       |   1993910969 |
+--------------------+--------------+
| codec:DIFF         |   1960970083 |
+--------------------+--------------+
| codec:FAST_DIFF    |   1061374722 |
+--------------------+--------------+
| codec:PREFIX_TREE  |   1066586604 |
+--------------------+--------------+

Where does a wayward soul look for guidance on which combination of the
above to choose for their application?

Thanks,
Nick
+
Ted Yu 2013-09-11, 20:19
+
lars hofhansl 2013-09-11, 20:30
+
Stack 2013-09-11, 21:29
+
Elliott Clark 2013-09-11, 20:22
+
Vladimir Rodionov 2013-09-11, 20:33
+
Nick Dimiduk 2013-09-19, 00:19
+
lars hofhansl 2013-09-19, 03:34
+
Ted Yu 2013-09-24, 20:11
+
Enis Söztutar 2013-09-26, 03:09
+
Enis Söztutar 2013-09-19, 00:30