Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Documenting Guidance on compression and codecs


Copy link to this message
-
Re: Documenting Guidance on compression and codecs
Here is another set of random data points:

Data generated with LoadTestTool, which inherently has some randomness.
However, these should be good for ballpark figures.

hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys
1000000 -read 100:30 -num_tables 1  -data_block_encoding NONE -tn
load_test_tool_NONE

279723839  /apps/hbase/data/data/default/load_test_tool_NONE
103100244  /apps/hbase/data/data/default/load_test_tool_DIFF
103432465  /apps/hbase/data/data/default/load_test_tool_FAST_DIFF
134790042  /apps/hbase/data/data/default/load_test_tool_PREFIX
97963420  /apps/hbase/data/data/default/load_test_tool_PREFIX_TREE

78579277  /apps/hbase/data/data/default/load_test_tool_GZ
105321959  /apps/hbase/data/data/default/load_test_tool_SNAPPY
108040063  /apps/hbase/data/data/default/load_test_tool_LZO
110784379  /apps/hbase/data/data/default/load_test_tool_LZ4

78059199  /apps/hbase/data/data/default/load_test_tool_SNAPPY_FAST_DIFF
77214771  /apps/hbase/data/data/default/load_test_tool_LZO_FAST_DIFF
Enis

On Tue, Sep 24, 2013 at 1:11 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> According to
>
> http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
> ,
> LZ4 is faster than LZOP but consumes much more memory.
>
> Cheers
>
>
> On Wed, Sep 18, 2013 at 8:34 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Do you have any numbers on compression speed, too?
> > I continue to be surprised by the relative compression ratios between
> LZ4,
> > LZO, and SNAPPY.
> > I had expected SNAPPY and LZO to be roughly equivalent and LZ4 to be far
> > better than LZO.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Nick Dimiduk <[EMAIL PROTECTED]>
> > To: hbase-dev <[EMAIL PROTECTED]>
> > Sent: Wednesday, September 18, 2013 5:19 PM
> > Subject: Re: Documenting Guidance on compression and codecs
> >
> >
> > For completeness, here's an entry for LZ4:
> >
> > +--------------------+--------------+
> > | compression:LZ4    |    391017061 |
> > +--------------------+--------------+
> >
> >
> >
> > On Wed, Sep 11, 2013 at 12:10 PM, Nick Dimiduk <[EMAIL PROTECTED]>
> wrote:
> >
> > > Do we have a consolidated resource with information and recommendations
> > > about use of the above? For instance, I ran a simple test using
> > > PerformanceEvaluation, examining just the size of data on disk for 1G
> of
> > > input data. The matrix below has some surprising results:
> > >
> > > +--------------------+--------------+
> > > | MODIFIER           | SIZE (bytes) |
> > > +--------------------+--------------+
> > > | none               |   1108553612 |
> > > +--------------------+--------------+
> > > | compression:SNAPPY |    427335534 |
> > > +--------------------+--------------+
> > > | compression:LZO    |    270422088 |
> > > +--------------------+--------------+
> > > | compression:GZ     |    152899297 |
> > > +--------------------+--------------+
> > > | codec:PREFIX       |   1993910969 |
> > > +--------------------+--------------+
> > > | codec:DIFF         |   1960970083 |
> > > +--------------------+--------------+
> > > | codec:FAST_DIFF    |   1061374722 |
> > > +--------------------+--------------+
> > > | codec:PREFIX_TREE  |   1066586604 |
> > > +--------------------+--------------+
> > >
> > > Where does a wayward soul look for guidance on which combination of the
> > > above to choose for their application?
> > >
> > > Thanks,
> > > Nick
> > >
> >
>