Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???


Copy link to this message
-
Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???


Le 01/03/10 01:20, Dan Washusen a �crit :
> My (very rough) calculation of the data size came up with around 50MB.  That
> was assuming 400 bytes * 100,000 for the values, 32 + 8 * 13 * 100,000 for
> the keys and an extra meg or two for extra key stuff.  I didn't understand
> how that resulted in the a region split, so I assume we are still missing
> some information (or I made a mistake).  As you mention, that should mean
> that everything is in the MemStore and compression has not come into play
> yet.  Puzzling...

You are right, there is no region split when I use no compression.
Nevertheless, as you say, if everything is in the memstore, how can
it be that I see a so big difference between my tests ?

>
> On PE; there isn't currently a way to specify compression options on the
> testtable without extending PE and overriding
> org.apache.hadoop.hbase.PerformanceEvaluation#getTableDescriptor method.
>   Maybe it could be added as an option?
>
> Cheers,
> Dan
>
> On 1 March 2010 10:56, Jean-Daniel Cryans<[EMAIL PROTECTED]>  wrote:
>
>> As Dan said, your data is so small you don't really trigger many
>> different behaviors in HBase, it could very well kept mostly in the
>> memstores where compression has no impact at all.
>>
>> WRT a benchmark, there's the PerformanceEvaluation (we call it PE for
>> short) which is well maintained and lets you set a compression level.
>> This page has an outdated help but it shows you how to run it:
>> http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation
>>
>> Another option is importing the wikipedia dump, which is highly
>> compressible and not manufactured like the PE. Last summer I wrote a
>> small MR job to do the import easily and although the code is based on
>> a dev version 0.20.0, it should be fairly easy to make it work on
>> 0.20.3 (probably just replacing the libs). See
>> http://code.google.com/p/hbase-wikipedia-loader/
>>
>> See the last paragraph of the Getting Started in the Wiki, I show some
>> import numbers:
>>
>> "For example, it took 29 min on a 6 nodes cluster (1 master and 5
>> region servers) with the same hardware (AMD Phenom(tm) 9550 Quad, 8GB,
>> 2x1TB disks), 2 map slot per task tracker (that's 10 parallel maps),
>> and GZ compression. With LZO and a new table it took 23 min 20 ses.
>> Compressed the table is 32 regions big, uncompressed it's 93 and took
>> 30 min 10 sec to import."
>>
>> You can see that the import was a lot faster on LZO. I didn't do any
>> reading test tho...
>>
>> Good luck!
>>
>> J-D
>>
>> On Sun, Feb 28, 2010 at 9:30 AM, Vincent Barat<[EMAIL PROTECTED]>
>> wrote:
>>> The impact of my cluster architecture on the performances is obviously
>> the
>>> same in my 3 test cases. Providing that I only change the compression
>> type
>>> between tests, I don't understand why changing the number of regions or
>>> whatever else would change the speed ratio between my tests, especially
>>> between the GZIP&  LZO tests.
>>>
>>> Is there some ready to use and easy to setup benchmarks I could use to
>> try
>>> to reproduce the issue in a well known environment ?
>>>
>>> Le 25/02/10 19:29, Jean-Daniel Cryans a �crit :
>>>>
>>>> If only 1 region, providing more than one nodes will probably just
>>>> slow down the test since the load is handled by one machine which has
>>>> to replicate blocks 2 times. I think your test would have much more
>>>> value if you really grew at least to 10 regions. Also make sure to run
>>>> the tests more than once on completely new hbase setups (drop table +
>>>> restart should be enough).
>>>>
>>>> May I also recommend upgrading to hbase 0.20.3? It will provide a
>>>> better experience in general.
>>>>
>>>> J-D
>>>>
>>>> On Thu, Feb 25, 2010 at 2:49 AM, Vincent Barat<[EMAIL PROTECTED]
>>>
>>>>   wrote:
>>>>>
>>>>> Unfortunately I can post only some snapshots.
>>>>>
>>>>> I have no region split (I insert just 100000 rows so there is no split,
>>>>> except when I don't use compression).
>>>>>
>>>>> I use HBase 0.20.2 and to insert I use the HTable.put(list<Put>);
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB