Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???


Copy link to this message
-
Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???
Vincent Barat 2010-02-24, 09:52
Yes of course.

We use a 4 machine cluster (4 large instances on AWS): 8 GB RAM
each, dual core CPU. 1 is for the Hadoop and HBase namenode /
masters, and 3 are hosting the datanode / regionservers.

The table used for testing is first created, then I insert
sequentially a set of rows and count the nb of rows inserted by second.

I insert rows by set of 1000 (using HTable.put(list<Put>);

When reading, I read also sequentially by using a scanner (scanner
caching is set to 1024 rows).

Maybe our installation of LZO is not good ?
Le 23/02/10 22:15, Jean-Daniel Cryans a �crit :
> Vincent,
>
> I don't expect that either, can you give us more info about your test
> environment?
>
> Thx,
>
> J-D
>
> On Tue, Feb 23, 2010 at 10:39 AM, Vincent Barat
> <[EMAIL PROTECTED]>  wrote:
>> Hello,
>>
>> I did some testing to figure out which compression algo I should use for my
>> HBase tables. I thought that LZO was the good candidate, but it appears that
>> it is the worst one.
>>
>> I uses one table with 2 families and 10 columns. Each row has a total of 200
>> to 400 bytes.
>>
>> Here is my results:
>>
>> GZIP:           2600 to 3200 inserts/s  12000 to 15000 reads/s
>> NO COMPRESSION: 2000 to 2600 inserts/s  4900 to 5020 reads/s
>> LZO             1600 to 2100 inserts/s  4020 to 4600 reads/s
>>
>> Do you have an explanation to this ? I though that the LZO compression was
>> always faster at compression and decompression than GZIP ?
>>
>>
>>
>