Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???


Copy link to this message
-
Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???
The impact of my cluster architecture on the performances is
obviously the same in my 3 test cases. Providing that I only change
the compression type between tests, I don't understand why changing
the number of regions or whatever else would change the speed ratio
between my tests, especially between the GZIP & LZO tests.

Is there some ready to use and easy to setup benchmarks I could use
to try to reproduce the issue in a well known environment ?

Le 25/02/10 19:29, Jean-Daniel Cryans a �crit :
> If only 1 region, providing more than one nodes will probably just
> slow down the test since the load is handled by one machine which has
> to replicate blocks 2 times. I think your test would have much more
> value if you really grew at least to 10 regions. Also make sure to run
> the tests more than once on completely new hbase setups (drop table +
> restart should be enough).
>
> May I also recommend upgrading to hbase 0.20.3? It will provide a
> better experience in general.
>
> J-D
>
> On Thu, Feb 25, 2010 at 2:49 AM, Vincent Barat<[EMAIL PROTECTED]>  wrote:
>> Unfortunately I can post only some snapshots.
>>
>> I have no region split (I insert just 100000 rows so there is no split,
>> except when I don't use compression).
>>
>> I use HBase 0.20.2 and to insert I use the HTable.put(list<Put>);
>>
>> The only difference between my 3 tests is the way I create the test table:
>>
>> HBaseAdmin admin = new HBaseAdmin(config);
>>
>> HTableDescriptor desc = new HTableDescriptor(name);
>>
>> HColumnDescriptor colDesc;
>>
>> colDesc = new HColumnDescriptor(Bytes.toBytes("meta:"));
>> colDesc.setMaxVersions(1);
>> colDesc.setCompressionType(Algorithm.GZ);<- LZO or NONE
>> desc.addFamily(colDesc);
>>
>> colDesc = new HColumnDescriptor(Bytes.toBytes("data:"));
>> colDesc.setMaxVersions(1);
>> colDesc.setCompressionType(Algorithm.GZ);<- LZO or NONE
>> desc.addFamily(colDesc);
>>
>> admin.createTable(desc);
>>
>> A typical row inserted is made of 13 columns with a short content, as show
>> here:
>>
>> 1264761195240/6ffc3fe659023 column=data:accuracy, timestamp=1267006115356,
>> value=1317
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:alt, timestamp=1267006115356,
>> value=0
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:country, timestamp=1267006115356,
>> value=France
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:countrycode,
>> timestamp=1267006115356, value=FR
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:lat, timestamp=1267006115356,
>> value=48.65869706
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:locality, timestamp=1267006115356,
>> value=Morsang-sur-Orge
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:lon, timestamp=1267006115356,
>> value=2.36138182
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:postalcode,
>> timestamp=1267006115356, value=91390
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=data:region, timestamp=1267006115356,
>> value=Ile-de-France
>>   a3c9cfed0a50a9f199ed42f2730
>>   1264761195240/6ffc3fe659023 column=meta:imei, timestamp=1267006115356,
>> value=6ffc3fe659023a3c9cfed0a50a9f199e
>>   a3c9cfed0a50a9f199ed42f2730 d42f2730
>>   1264761195240/6ffc3fe659023 column=meta:infoid, timestamp=1267006115356,
>> value=ca30781e0c375a1236afbf323cbfa4
>>   a3c9cfed0a50a9f199ed42f2730 0dc2c7c7af
>>   1264761195240/6ffc3fe659023 column=meta:locid, timestamp=1267006115356,
>> value=5e15a0281e83cfe55ec1c362f84a39f
>>   a3c9cfed0a50a9f199ed42f2730 006f18128
>>   1264761195240/6ffc3fe659023 column=meta:timestamp, timestamp=1267006115356,
>> value=1264761195240
>>   a3c9cfed0a50a9f199ed42f2730
>>
>> Maybe LZO works much better with fewer rows with bigger content?
>>
>> Le 24/02/10 19:10, Jean-Daniel Cryans a �crit :
>>>
>>> Are you able to post the code used for the insertion? It could be