Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Long client pauses with compression


Copy link to this message
-
Re: Long client pauses with compression
I changed the settings as described below:

hbase.hstore.blockingStoreFiles=20
hbase.hregion.memstore.block.multiplier=4
MAX_FILESIZE=512mb
MEMSTORE_FLUSHSIZE=128mb

I also created the table with 6 regions initially. Before I wasn't creating any regions initially. I needed to make all of these changes together to entirely eliminate the very long pauses. Now there are no pauses much longer than a second.

Thanks much for the help. I am still not entirely sure why compression seems to expose this problem, however.
On Mar 14, 2011, at 11:54 AM, Jean-Daniel Cryans wrote:

> Alright so here's a preliminary report:
>
> - No compression is stable for me too, short pauses.
> - LZO gave me no problems either, generally faster than no compression.
> - GZ initially gave me weird results, but I quickly saw that I forgot
> to copy over the native libs from the hadoop folder so my logs were
> full of:
>
> 2011-03-14 10:20:29,624 INFO org.apache.hadoop.io.compress.CodecPool:
> Got brand-new compressor
> 2011-03-14 10:20:29,626 INFO org.apache.hadoop.io.compress.CodecPool:
> Got brand-new compressor
> 2011-03-14 10:20:29,628 INFO org.apache.hadoop.io.compress.CodecPool:
> Got brand-new compressor
> 2011-03-14 10:20:29,630 INFO org.apache.hadoop.io.compress.CodecPool:
> Got brand-new compressor
> 2011-03-14 10:20:29,632 INFO org.apache.hadoop.io.compress.CodecPool:
> Got brand-new compressor
> 2011-03-14 10:20:29,634 INFO org.apache.hadoop.io.compress.CodecPool:
> Got brand-new compressor
> 2011-03-14 10:20:29,636 INFO org.apache.hadoop.io.compress.CodecPool:
> Got brand-new compressor
>
> I copied the libs over, bounced the region servers, and the
> performance was much more stable until a point where I got a 20
> seconds pause, and looking at the logs I see:
>
> 2011-03-14 10:31:17,625 WARN
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
> test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744. has too many
> store files; delaying flush up to 90000ms
>
> (our config sets the block at 20 store files instead of the default
> which is around 12 IIRC)
>
> Quickly followed by a bunch of:
>
> 2011-03-14 10:31:26,757 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for
> 'IPC Server handler 20 on 60020' on region
> test,,1300127266461.9d0eb095b77716c22cd5c78bb503c744.: memstore size
> 285.6m is >= than blocking 256.0m size
>
> (our settings make it that we won't block on memstores until 4x their
> sizes, in your case you may see a 2x blocking factor so 128MB which is
> default)
>
> The reason is that our memstores, once flushed, occupy a very small
> space, consider this:
>
> 2011-03-14 10:31:16,606 INFO
> org.apache.hadoop.hbase.regionserver.Store: Added
> hdfs://sv2borg169:9000/hbase/test/9d0eb095b77716c22cd5c78bb503c744/test/420552941380451032,
> entries=216000, sequenceid=70556635737, memsize=64.3m, filesize=6.0m
>
> It means that it will create tiny files of ~6MB and the compactor will
> spend all it's time merging those files until a point where HBase must
> stop inserting in order to not blow its available memory. Thus, the
> same data will get rewritten a couple of times.
>
> Normally, and by that I mean a system where you're not just trying to
> insert data ASAP but where most of your workload is made up of reads,
> this works well as the memstores are filled much more slowly and
> compactions happen at a normal pace.
>
> If you search around the interwebs for tips on speeding up HBase
> inserts, you'll often see the configs I referred to earlier:
>
>  <name>hbase.hstore.blockingStoreFiles</name>
>  <value>20</value>
> and
>  <name>hbase.hregion.memstore.block.multiplier</name>
>  <value>4</value>
>
> They should work pretty well for most use cases that are made of heavy
> writes given that the region servers have enough heap (eg more than 3
> or 4GB). You should also consider setting MAX_FILESIZE to >1GB to
> limit the number of regions and MEMSTORE_FLUSHSIZE to >128MB to flush
> bigger files.