Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> how to make tuning for hbase (every couple of days hbase region sever/s crashe)

Copy link to this message
Re: how to make tuning for hbase (every couple of days hbase region sever/s crashe)
Sorry, I missed the fact that you guys were talking about the oome thing
(the exceptions were of sockettimeout)
Can you give the log snippet where it oome'd? I want to explore this use
case :)

You have about 200 regions per server, and each region configured to 500MB
makes it 100GB data per server.
Each Region is considered open when index block of all its StoreFiles are
read; where the default block size of the HFile is 64KB. Having a larger
block size will help in reducing the index size for each StoreFile. As Chris
said, looking RS metrics will give lot of useful info such as
storefileindex, blockcache.
I think that only increasing Region size to 500MB will not reduce memory
footprint (apart from reducing Region split and some entries in '.META.'),
as one has to deal with StoreFiles eventually. Yes, reducing size of
keyvalue co-ordinates will help in limiting its size (I am sure you already
have an optimised schema).

Your gc-log snapshot says that CMS failed to free even 1 byte, and then fall
back on "stop-the-world" gc. This means there are literally no garbage
object in the heap during that time window? Or, maybe your app was heavily
writing concurrently to the RS. Since not even a single byte was freed,
using MSLAB will not help (if you haven't enabled it yet); as it is for
defragmenting the freed space because cms doesn't do any compaction on its

What did you do to sort this error eventually Oleg? Does bumping the RS heap
fixed it? Are you using compression while writing to HBase?


On Thu, Aug 25, 2011 at 9:41 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote:

> On Aug 25, 2011, at 1:55 AM, Oleg Ruchovets wrote:
> > Thank you very much for your post , It very similar what is happening in
> our
> > environment.
> >
> >    Before we are going to increase HeapSize we want to make some tuning
> for
> > hbase memstore and related components.
> >
> We run with 12-15GB of heap, 4 was only enough for us in very small test
> DBs.
> > 1)  Currently our configuration parameters related to memstore are
> > default.
> >
> >      --    hbase.regionserver.global.memstore.upperLimit=0.4
> >
> >      --   hbase.regionserver.global.memstore.lowerLimit=0.35
> >
> >      --   hbase.hregion.memstore.flush.size=67108864
> >
> >     --    hbase.hregion.memstore.block.multiplier=2
> >
> >     --   hbase.hstore.compactionThreshold=3
> >
> >     --  hbase.hstore.blockingStoreFiles=7
> >
> > 1. Could you recommend an alternative configuration that is more suitable
> > for heavy loads?
> >
> Hard to say without more details - off hand you might want to lower the
> flush size and the upper/lower limits. You should monitor the cluster during
> loads, using the web UI to see each regionservers memory usage. If you click
> through to an individual regionserver you can see how the heap is being
> used.
> > 2. We still don't understand why we get region server OOME only once
> every
> > few days (and not every day - since each day we insert the same amount of
> > data) and why the region server heap size is growing constantly. We
> expect
> > that after memstore flushing the heap will go back to normal but this
> > doesn't happen until we restart hbase.
> >
> I would suspect it is happening when a regionserver has a large
> StoreFileIndex and is hosting a particularly hot region that is getting lots
> of updates. When those events coincide on a single server it OOMEs.
> > 3. We know exact  start  time of our  hbase job , can we force memstore
> > flush before starting the job ?
> >
> >
> from the hbase shell run
> flush 'table_name'
> I would highly recommend looking at how the region servers are using heap
> when you first start them up and see how large your StoreFileIndex is.
> -chris
> >
> > On Wed, Aug 24, 2011 at 7:06 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote:
> >
> >>
> >>
> >> We had a similar OOME problem and and we solved it by allocating more
> heap
> >> space. The underlying cause for us was as the table grew, the