Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> RegionServers Crashing every hour in production env


+
Pablo Musa 2013-03-08, 15:44
+
Ted Yu 2013-03-08, 16:01
+
ramkrishna vasudevan 2013-03-08, 16:32
+
Stack 2013-03-08, 17:11
+
Pablo Musa 2013-03-08, 18:58
+
Stack 2013-03-08, 22:02
+
Pablo Musa 2013-03-10, 18:59
+
Sreepathi 2013-03-10, 19:06
+
Pablo Musa 2013-03-10, 22:29
Copy link to this message
-
Re: RegionServers Crashing every hour in production env
You could increase your zookeeper session timeout to 5 minutes while you
are figuring why these long pauses.
http://hbase.apache.org/book.html#zookeeper.session.timeout

Above, there is an outage for almost 5 minutes:

>> We slept 225100ms instead of 3000ms, this is likely due to a long

You have ganglia or tsdb running?  When you see the big pause above, can
you see anything going on on the machine?  (swap, iowait, concurrent fat
mapreduce job?)

St.Ack

On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa <[EMAIL PROTECTED]> wrote:

> Hi Sreepathi,
> they say in the book (or the site), we could try it to see if it is really
> a timeout error
> or there is something more. But it is not recomended for production
> environments.
>
> I could give it a try if five minutes will ensure to us that the problem
> is the GC or
> elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30
> minutes.
>
> Abs,
> Pablo
>
>
> On 03/10/2013 04:06 PM, Sreepathi wrote:
>
>> Hi Stack/Ted/Pablo,
>>
>> Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 ms
>> )  ?
>>
>> Regards,
>> - Sreepathi
>>
>> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote:
>>
>>  That combo should be fine.
>>>>
>>> Great!!
>>>
>>>
>>>  If JVM is full GC'ing, the application is stopped.
>>>> The below does not look like a full GC but that is a long pause in
>>>> system
>>>> time, enough to kill your zk session.
>>>>
>>> Exactly. This pause is really making the zk expire the RS which shutsdown
>>> (logs
>>> in the end of the email).
>>> But the question is: what is causing this pause??!!
>>>
>>>  You swapping?
>>>>
>>> I don't think so (stats below).
>>>
>>>  Hardware is good?
>>>>
>>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk space.
>>> Below are some metrics I have heard about. Hope it helps.
>>>
>>>
>>> ** I am having some problems with the datanodes[1] which are having
>>> trouble to
>>> write. I really think the issues are related, but cannot solve any of
>>> them
>>> :(
>>>
>>> Thanks again,
>>> Pablo
>>>
>>> [1] http://mail-archives.apache.****org/mod_mbox/hadoop-hdfs-user/****
>>> 201303.mbox/%3CCAJzooYfS-F1KS+******jGOPUt15PwFjcCSzigE0APeM9FXaCr****
>>> [EMAIL PROTECTED]%3E<http:**//mail-archives.apache.org/**
>>> mod_mbox/hadoop-hdfs-user/**201303.mbox/%3CCAJzooYfS-F1KS+**
>>> jGOPUt15PwFjcCSzigE0APeM9FXaCr**[EMAIL PROTECTED]%3E<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E>
>>> >
>>>
>>> top - 15:38:04 up 297 days, 21:03,  2 users,  load average: 4.34, 2.55,
>>> 1.28
>>> Tasks: 528 total,   1 running, 527 sleeping,   0 stopped,   0 zombie
>>> Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi, 0.0%si,
>>>   0.0%st
>>> Mem:  74187256k total, 29493992k used, 44693264k free,  5836576k buffers
>>> Swap: 51609592k total,   128312k used, 51481280k free,  1353400k cached
>>>
>>> ]$ vmstat -w
>>> procs -------------------memory-----****------------- ---swap--
>>> -----io----
>>> --system-- -----cpu-------
>>>   r  b       swpd       free       buff      cache   si   so    bi bo
>>> in
>>>    cs  us sy  id wa st
>>>   2  0     128312   32416928    5838288    5043560    0    0   202 53
>>>  0
>>>     0   2  1  96  1  0
>>>
>>> ]$ sar
>>> 02:20:01 PM     all     26.18      0.00      2.90      0.63 0.00
>>> 70.29
>>> 02:30:01 PM     all      1.66      0.00      1.25      1.05 0.00
>>> 96.04
>>> 02:40:01 PM     all     10.01      0.00      2.14      0.75 0.00
>>> 87.11
>>> 02:50:01 PM     all      0.76      0.00      0.80      1.03 0.00
>>> 97.40
>>> 03:00:01 PM     all      0.23      0.00      0.30      0.71 0.00
>>> 98.76
>>> 03:10:01 PM     all      0.22      0.00      0.30      0.66 0.00
>>> 98.82
>>> 03:20:01 PM     all      0.22      0.00      0.31      0.76 0.00
>>> 98.71
>>> 03:30:01 PM     all      0.24      0.00      0.31      0.64 0.00
>>> 98.81
>>> 03:40:01 PM     all      1.13      0.00      2.97      1.18 0.00
+
Azuryy Yu 2013-03-11, 02:13
+
Andrew Purtell 2013-03-11, 02:24
+
Pablo Musa 2013-03-12, 15:43
+
Pablo Musa 2013-04-03, 18:21
+
Ted Yu 2013-04-03, 18:36
+
Pablo Musa 2013-04-03, 20:24
+
Ted Yu 2013-04-03, 21:40
+
Azuryy Yu 2013-03-11, 02:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB