Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Long client pauses with compression

Copy link to this message
Long client pauses with compression
I am using the Java client API to write 10,000 rows with about 6000 columns each, via 8 threads making multiple calls to the HTable.put(List<Put>) method. I start with an empty table with one column family and no regions pre-created.

With compression turned off, I am seeing very stable performance. At the start there are a couple of 10-20sec  pauses where all insert threads are blocked during a region split. Subsequent splits do not cause all of the threads to block, presumably because there are more regions so no one region split blocks all inserts. GCs for HBase during the insert is not a major problem (6k/55sec).

When using either LZO or gzip compression, however, I am seeing frequent and long pauses, sometimes around 20 sec but often over 80 seconds in my test. During these pauses all 8 of the threads writing to HBase are blocked. The pauses happen throughout the insert process. GCs are higher in HBase when using compression (60k, 4min), but it doesn't seem enough to explain these pauses. Overall performance obviously suffers dramatically as a result (about 2x slower).

I have tested this in different configurations (single node, 4 nodes) with the same result. I'm using HBase 0.90.1 (CDH3B4), Sun/Oracle Java 1.6.0_24, CentOS 5.5, Hadoop LZO 0.4.10 from Cloudera. Machines have 12 cores and 24 gb of RAM. Settings are pretty much default, nothing out of the ordinary. I tried playing around with region handler count and memstore settings, but these had no effect.