Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???


Copy link to this message
-
Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???
The impact of my cluster architecture on the performances is
obviously the same in my 3 test cases. Providing that I only change
the compression type between tests, I don't understand why changing
the number of regions or whatever else would change the speed ratio
between my tests, especially between the GZIP & LZO tests.

Is there some ready to use and easy to setup benchmarks I could use
to try to reproduce the issue in a well known environment ?

Le 25/02/10 19:29, Jean-Daniel Cryans a �crit :
> If only 1 region, providing more than one nodes will probably just
> slow down the test since the load is handled by one machine which has
> to replicate blocks 2 times. I think your test would have much more
> value if you really grew at least to 10 regions. Also make sure to run
> the tests more than once on completely new hbase setups (drop table +
> restart should be enough).
>
> May I also recommend upgrading to hbase 0.20.3? It will provide a
> better experience in general.
>
> J-D
>
> On Thu, Feb 25, 2010 at 2:49 AM, Vincent Barat<[EMAIL PROTECTED]>  wrote:
>> Unfortunately I can post only some snapshots.
>>
>> I have no region split (I insert just 100000 rows so there is no split,
>> except when I don't use compression).
>>
>> I use HBase 0.20.2 and to insert I use the HTable.put(list<Put>);
>>
>> The only difference between my 3 tests is the way I create the test table:
>>
>> HBaseAdmin admin = new HBaseAdmin(config);
>>
>> HTableDescriptor desc = new HTableDescriptor(name);
>>
>> HColumnDescriptor colDesc;
>>
>> colDesc = new HColumnDescriptor(Bytes.toBytes("meta:"));
>> colDesc.setMaxVersions(1);
>> colDesc.setCompressionType(Algorithm.GZ);<- LZO or NONE
>> desc.addFamily(colDesc);
>>
>> colDesc = new HColumnDescriptor(Bytes.toBytes("data:"));
>> colDesc.setMaxVersions(1);
>> colDesc.setCompressionType(Algorithm.GZ);<- LZO or NONE
>> desc.addFamily(colDesc);
>>
>> admin.createTable(desc);
>>
>> A typical row inserted is made of 13 columns with a short content, as show
>> here:
>>
>> 1264761195240/6ffc3fe659023 column=data:accuracy, timestamp=1267006115356,
>> value=1317
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:alt, timestamp=1267006115356,
>> value=0
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:country, timestamp=1267006115356,
>> value=France
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:countrycode,
>> timestamp=1267006115356, value=FR
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:lat, timestamp=1267006115356,
>> value=48.65869706
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:locality, timestamp=1267006115356,
>> value=Morsang-sur-Orge
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:lon, timestamp=1267006115356,
>> value=2.36138182
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:postalcode,
>> timestamp=1267006115356, value=91390
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:region, timestamp=1267006115356,
>> value=Ile-de-France
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=meta:imei, timestamp=1267006115356,
>> value=6ffc3fe659023a3c9cfed0a50a9f199e
>>   a3c9cfed0a50a9f199ed42f2730 d42f2730
>>   1264761195240/6ffc3fe659023 column=meta:infoid, timestamp=1267006115356,
>> value=ca30781e0c375a1236afbf323cbfa4
>>   a3c9cfed0a50a9f199ed42f2730 0dc2c7c7af
>>   1264761195240/6ffc3fe659023 column=meta:locid, timestamp=1267006115356,
>> value=5e15a0281e83cfe55ec1c362f84a39f
>>   a3c9cfed0a50a9f199ed42f2730 006f18128
>>   1264761195240/6ffc3fe659023 column=meta:timestamp, timestamp=1267006115356,
>> value=1264761195240
>>   a3c9cfed0a50a9f199ed42f2730
>>
>> Maybe LZO works much better with fewer rows with bigger content?
>>
>> Le 24/02/10 19:10, Jean-Daniel Cryans a �crit :
>>>
>>> Are you able to post the code used for the insertion? It could be
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB