Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Documenting Guidance on compression and codecs


Copy link to this message
-
Re: Documenting Guidance on compression and codecs
Here is another set of random data points:

Data generated with LoadTestTool, which inherently has some randomness.
However, these should be good for ballpark figures.

hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys
1000000 -read 100:30 -num_tables 1  -data_block_encoding NONE -tn
load_test_tool_NONE

279723839  /apps/hbase/data/data/default/load_test_tool_NONE
103100244  /apps/hbase/data/data/default/load_test_tool_DIFF
103432465  /apps/hbase/data/data/default/load_test_tool_FAST_DIFF
134790042  /apps/hbase/data/data/default/load_test_tool_PREFIX
97963420  /apps/hbase/data/data/default/load_test_tool_PREFIX_TREE

78579277  /apps/hbase/data/data/default/load_test_tool_GZ
105321959  /apps/hbase/data/data/default/load_test_tool_SNAPPY
108040063  /apps/hbase/data/data/default/load_test_tool_LZO
110784379  /apps/hbase/data/data/default/load_test_tool_LZ4

78059199  /apps/hbase/data/data/default/load_test_tool_SNAPPY_FAST_DIFF
77214771  /apps/hbase/data/data/default/load_test_tool_LZO_FAST_DIFF
Enis

On Tue, Sep 24, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> According to
>
> http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
> ,
> LZ4 is faster than LZOP but consumes much more memory.
>
> Cheers
>
>
> On Wed, Sep 18, 2013 at 8:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Do you have any numbers on compression speed, too?
> > I continue to be surprised by the relative compression ratios between
> LZ4,
> > LZO, and SNAPPY.
> > I had expected SNAPPY and LZO to be roughly equivalent and LZ4 to be far
> > better than LZO.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Nick Dimiduk <[EMAIL PROTECTED]>
> > To: hbase-dev <[EMAIL PROTECTED]>
> > Sent: Wednesday, September 18, 2013 5:19 PM
> > Subject: Re: Documenting Guidance on compression and codecs
> >
> >
> > For completeness, here's an entry for LZ4:
> >
> > +--------------------+--------------+
> > | compression:LZ4    |    391017061 |
> > +--------------------+--------------+
> >
> >
> >
> > On Wed, Sep 11, 2013 at 12:10 PM, Nick Dimiduk <[EMAIL PROTECTED]>
> wrote:
> >
> > > Do we have a consolidated resource with information and recommendations
> > > about use of the above? For instance, I ran a simple test using
> > > PerformanceEvaluation, examining just the size of data on disk for 1G
> of
> > > input data. The matrix below has some surprising results:
> > >
> > > +--------------------+--------------+
> > > | MODIFIER           | SIZE (bytes) |
> > > +--------------------+--------------+
> > > | none               |   1108553612 |
> > > +--------------------+--------------+
> > > | compression:SNAPPY |    427335534 |
> > > +--------------------+--------------+
> > > | compression:LZO    |    270422088 |
> > > +--------------------+--------------+
> > > | compression:GZ     |    152899297 |
> > > +--------------------+--------------+
> > > | codec:PREFIX       |   1993910969 |
> > > +--------------------+--------------+
> > > | codec:DIFF         |   1960970083 |
> > > +--------------------+--------------+
> > > | codec:FAST_DIFF    |   1061374722 |
> > > +--------------------+--------------+
> > > | codec:PREFIX_TREE  |   1066586604 |
> > > +--------------------+--------------+
> > >
> > > Where does a wayward soul look for guidance on which combination of the
> > > above to choose for their application?
> > >
> > > Thanks,
> > > Nick
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB