Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???


Copy link to this message
-
Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???
Unfortunately I can post only some snapshots.

I have no region split (I insert just 100000 rows so there is no
split, except when I don't use compression).

I use HBase 0.20.2 and to insert I use the HTable.put(list<Put>);

The only difference between my 3 tests is the way I create the test
table:

HBaseAdmin admin = new HBaseAdmin(config);

HTableDescriptor desc = new HTableDescriptor(name);

HColumnDescriptor colDesc;

colDesc = new HColumnDescriptor(Bytes.toBytes("meta:"));
colDesc.setMaxVersions(1);
colDesc.setCompressionType(Algorithm.GZ); <- LZO or NONE
desc.addFamily(colDesc);

colDesc = new HColumnDescriptor(Bytes.toBytes("data:"));
colDesc.setMaxVersions(1);
colDesc.setCompressionType(Algorithm.GZ); <- LZO or NONE
desc.addFamily(colDesc);

admin.createTable(desc);

A typical row inserted is made of 13 columns with a short content,
as show here:

1264761195240/6ffc3fe659023 column=data:accuracy,
timestamp=1267006115356, value=1317
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:alt,
timestamp=1267006115356, value=0
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:country,
timestamp=1267006115356, value=France
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:countrycode,
timestamp=1267006115356, value=FR
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:lat,
timestamp=1267006115356, value=48.65869706
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:locality,
timestamp=1267006115356, value=Morsang-sur-Orge
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:lon,
timestamp=1267006115356, value=2.36138182
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:postalcode,
timestamp=1267006115356, value=91390
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=data:region,
timestamp=1267006115356, value=Ile-de-France
  a3c9cfed0a50a9f199ed42f2730

  1264761195240/6ffc3fe659023 column=meta:imei,
timestamp=1267006115356, value=6ffc3fe659023a3c9cfed0a50a9f199e
  a3c9cfed0a50a9f199ed42f2730 d42f2730

  1264761195240/6ffc3fe659023 column=meta:infoid,
timestamp=1267006115356, value=ca30781e0c375a1236afbf323cbfa4
  a3c9cfed0a50a9f199ed42f2730 0dc2c7c7af

  1264761195240/6ffc3fe659023 column=meta:locid,
timestamp=1267006115356, value=5e15a0281e83cfe55ec1c362f84a39f
  a3c9cfed0a50a9f199ed42f2730 006f18128

  1264761195240/6ffc3fe659023 column=meta:timestamp,
timestamp=1267006115356, value=1264761195240
  a3c9cfed0a50a9f199ed42f2730
Maybe LZO works much better with fewer rows with bigger content?

Le 24/02/10 19:10, Jean-Daniel Cryans a �crit :
> Are you able to post the code used for the insertion? It could be
> something with your usage pattern or something wrong with the code
> itself.
>
> How many rows are you inserting? Do you even have some region splits?
>
> J-D
>
> On Wed, Feb 24, 2010 at 1:52 AM, Vincent Barat<[EMAIL PROTECTED]>  wrote:
>> Yes of course.
>>
>> We use a 4 machine cluster (4 large instances on AWS): 8 GB RAM each, dual
>> core CPU. 1 is for the Hadoop and HBase namenode / masters, and 3 are
>> hosting the datanode / regionservers.
>>
>> The table used for testing is first created, then I insert sequentially a
>> set of rows and count the nb of rows inserted by second.
>>
>> I insert rows by set of 1000 (using HTable.put(list<Put>);
>>
>> When reading, I read also sequentially by using a scanner (scanner caching
>> is set to 1024 rows).
>>
>> Maybe our installation of LZO is not good ?
>>
>>
>> Le 23/02/10 22:15, Jean-Daniel Cryans a �crit :
>>>
>>> Vincent,
>>>
>>> I don't expect that either, can you give us more info about your test
>>> environment?
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Tue, Feb 23, 2010 at 10:39 AM, Vincent Barat
>>> <[EMAIL PROTECTED]>    wrote:
>>>>
>>>> Hello,
>>>>
>>>> I did some testing to figure out which compression algo I should use for
>>>> my
>>>> HBase tables. I thought that LZO was the good candidate, but it appears
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB