Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RegionServers Crashing every hour in production env


Copy link to this message
-
Re: RegionServers Crashing every hour in production env
Hello guys,
I stopped my research on HBase ZK timeout for while due to
other issues I had to do, but I am back.

A very weird behavior that I would like your comments is that my
RegionServers perform better (less crashes) under heavy load instead
of light load.
There is, if I let HBase alone with 50 requestsPerSecond along the
whole day the crashes are higher than if I run a mapred Job every hour.
Another weird thing is the following:

RS startTime = Mon Apr 01 13:22:35 BRT 2013

[...]$ grep slept /var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.log
2013-04-01 20:09:21,135 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 45491ms instead of 3000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-04-01 22:45:59,407 WARN org.apache.hadoop.hbase.util.Sleeper: We
slept 101271ms instead of 3000ms, this is likely due to a long garbage
collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

[...]$ egrep 'real=[1-9][0-9][0-9][0-9]*\.[0-9][0-9]'
/var/log/hbase/hbase-hbase-regionserver-PSLBHDN00*.out
* the below report is the above command for each time range.
0.0 - 0.1  secs GCs = 5084
0.1 - 0.5  secs GCs = 9
0.5 - 1.0  secs GCs = 3
1.0 - 010  secs GCs = 0
010 - 100  secs GCs = 0
100 - 1000 secs GCs = 0

So, my script for getting every gc time ("real=... secs") says that
there is no gc that took longer than 1 second.
However the RS log says twice that the RS slept more than 40 seconds
instead of 3.

"this is likely due to a long garbage collecting pause", yes
this is likely but I dont think it is the case.

The machine is a huge machine with 70GB RAM, 32 procs, light load,
no swap or iowait.

Any ideas?

Thanks,
Pablo

On 03/12/2013 12:43 PM, Pablo Musa wrote:
> Guys,
> thank you very much for the help.
>
> Yesterday I spent 14 hours trying to tune the whole cluster.
> The cluster is not ready yet needs a lot of tunning, but at least is
> working.
>
> My first big problem was namenode + datanode GC. They were not using
> CMS and thus were taking "incremental" time to run. Ii started in 0.01
> ms and
> in 20 minutes was taking 150 secs.
> After setting CMSGC this time is much smaller taking a maximum of 70 secs,
> which is VERY HIGH, but for now does not stop HBase.
>
> With this issue solved, it was clear that the RS was doing a long pause GC,
> taking up to 220 secs. Zookeeper expired the RS and it shutdown.
> I tried a lot of different flags configuration (MORE than 20), and could not
> get small gcs. Eventually it would take more than 150 secs (zookeeper
> timeout)
> and shutdown.
>
> Finally I tried a config that so far, 12 hours, is working with a maximum GC
> time of 90 secs. Which of course is a terrible problem since HBase is a
> database, but at least the cluster is stable while I can tune it a
> little more.
>
> In my opinion, my biggest problem is to have a few "monster" machines in the
> cluster instead of a bunch of commodities machines. I don't know if
> there are
> a lot companies using this kind of machines inside a hadoop cluster, but
> a fast search on google could not find a lot of tunes for big heap GCs.
>
> I guess my next step will be search for big heap gc tuning.
>
> Back to some questions ;)
>
>   > You have ganglia or tsdb running?
>
> I use zabbix for now, and no there is nothing going on when the big
> pause happens.
>
>   > When you see the big pause above, can you see anything going on on the
>   > machine? (swap, iowait, concurrent fat mapreduce job?)
>   > what are you doing during long GC happened? read or write? if
> reading, what
>   > the block cache size?
>
> The cpu for the RS process goes to 100% and the logs "pause", until it
> gets out.
> Ex: [NewPar
>
> IO and SWAP are normal. There is no MR running, just normal database
> load, which is
> very low. I am probably doing reads AND writes to the database with
> default block
> cache size.