Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Region server crashes


+
Lior Schachter 2012-03-25, 08:23
+
Jean-Daniel Cryans 2012-03-26, 17:43
Copy link to this message
-
Re: Region server crashes
Lior Schachter 2012-03-27, 09:54
Thanks for the detailed answer.
we are running 0.90.2, and the problem resolved after running major
compaction manually.

It seems that the problem was with client request waiting in the queue (so
I don't understand why major compaction solved it...).

Anyhow, I will try to apply the configurations you sent and see if it can
eliminate the problem.

Lior

On Mon, Mar 26, 2012 at 7:43 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> On Sun, Mar 25, 2012 at 1:23 AM, Lior Schachter <[EMAIL PROTECTED]>
> wrote:
> > Hi all,
> > We use hbase 0.9.2. We recently started to experience region servers
>
> You mean 0.90.2? Or 0.92.0?
>
> > crashed under heavy load (2-3 different servers crashes eah load).
> > Seems like missing block in HDFS causes a full GC and regions are being
> > closed.
>
> Not at all.
>
> So first we can see that your region server was doing Full GCs back to
> back because it's not able to collect anything (look how the numbers
> aren't decreasing). This eventually leads to a session timeout in
> zookeeper and at some point your region server woke up and saw that it
> lost control of the HDFS files caused by IO fencing (I know those
> exceptions look bad, but it's "normal").
>
> Now to fix this, there are multiple avenues to explore. The non-stop
> GCing means that the memory is completely full, but it grew slow
> enough that it got a concurrent mode failure before a
> OutOfMemoryError. Here's what's using memory in HBase:
>
>  - MemStores
>  - Block Cache
>  - Client requests
>  - Background tasks like flushing and compacting
>
> The first two you can control the amount of memory they use, have you
> tweaked that?
>
> And you say this is happening under heavy load (I'm guessing
> inserts?), so it might be that the client requests carry payloads that
> the region server can't possibly hold all at the same time. Im 0.90
> and 0.92 the amount of memory dedicated for this is unbounded,
> starting in 0.94 this will come in handy:
>
> https://issues.apache.org/jira/browse/HBASE-5190
>
> At the same time I'm pretty sure you have some blocking and/or
> splitting going on and the client requests are just sitting there in
> the region server memory (grep -i your logs for "block" to confirm)
> while this happens.
>
> At this point there's 3 things you can do:
>
>  - Use bulk loading instead of brute forcing it into HBase.
>  - Tune HBase in order to block as less as possible if you still like
> brute forcing. This means setting bigger regions, bigger memstores,
> and enabling the compactions to block at higher than 7 files.
>  - Tune the IPC queue capacity in order to have less calls sitting in
> the RS memory. If you're on 0.90.2 the "easy" way to do it is to have
> less handlers by setting hbase.regionserver.handler.count to less than
> 10. On 0.92 you can tweak it more directly with
> ipc.server.max.queue.size where the default is 100 (or 1000 in 0.90.2
> FWIW). This was all discussed in
> https://issues.apache.org/jira/browse/HBASE-3813
>
>
> Hope this helps in some ways,
>
> J-D
>