Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RegionServers Crashing every hour in production env


Copy link to this message
-
Re: RegionServers Crashing every hour in production env
Hi Sreepathi,
they say in the book (or the site), we could try it to see if it is
really a timeout error
or there is something more. But it is not recomended for production
environments.

I could give it a try if five minutes will ensure to us that the problem
is the GC or
elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30
minutes.

Abs,
Pablo

On 03/10/2013 04:06 PM, Sreepathi wrote:
> Hi Stack/Ted/Pablo,
>
> Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 ms
> )  ?
>
> Regards,
> - Sreepathi
>
> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote:
>
>>> That combo should be fine.
>> Great!!
>>
>>
>>> If JVM is full GC'ing, the application is stopped.
>>> The below does not look like a full GC but that is a long pause in system
>>> time, enough to kill your zk session.
>> Exactly. This pause is really making the zk expire the RS which shutsdown
>> (logs
>> in the end of the email).
>> But the question is: what is causing this pause??!!
>>
>>> You swapping?
>> I don't think so (stats below).
>>
>>> Hardware is good?
>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk space.
>> Below are some metrics I have heard about. Hope it helps.
>>
>>
>> ** I am having some problems with the datanodes[1] which are having
>> trouble to
>> write. I really think the issues are related, but cannot solve any of them
>> :(
>>
>> Thanks again,
>> Pablo
>>
>> [1] http://mail-archives.apache.**org/mod_mbox/hadoop-hdfs-user/**
>> 201303.mbox/%3CCAJzooYfS-F1KS+**jGOPUt15PwFjcCSzigE0APeM9FXaCr**
>> [EMAIL PROTECTED]%3E<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E>
>>
>> top - 15:38:04 up 297 days, 21:03,  2 users,  load average: 4.34, 2.55,
>> 1.28
>> Tasks: 528 total,   1 running, 527 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi, 0.0%si,
>>   0.0%st
>> Mem:  74187256k total, 29493992k used, 44693264k free,  5836576k buffers
>> Swap: 51609592k total,   128312k used, 51481280k free,  1353400k cached
>>
>> ]$ vmstat -w
>> procs -------------------memory-----**------------- ---swap-- -----io----
>> --system-- -----cpu-------
>>   r  b       swpd       free       buff      cache   si   so    bi bo   in
>>    cs  us sy  id wa st
>>   2  0     128312   32416928    5838288    5043560    0    0   202 53    0
>>     0   2  1  96  1  0
>>
>> ]$ sar
>> 02:20:01 PM     all     26.18      0.00      2.90      0.63 0.00     70.29
>> 02:30:01 PM     all      1.66      0.00      1.25      1.05 0.00     96.04
>> 02:40:01 PM     all     10.01      0.00      2.14      0.75 0.00     87.11
>> 02:50:01 PM     all      0.76      0.00      0.80      1.03 0.00     97.40
>> 03:00:01 PM     all      0.23      0.00      0.30      0.71 0.00     98.76
>> 03:10:01 PM     all      0.22      0.00      0.30      0.66 0.00     98.82
>> 03:20:01 PM     all      0.22      0.00      0.31      0.76 0.00     98.71
>> 03:30:01 PM     all      0.24      0.00      0.31      0.64 0.00     98.81
>> 03:40:01 PM     all      1.13      0.00      2.97      1.18 0.00     94.73
>> Average:        all      3.86      0.00      1.38      0.88 0.00     93.87
>>
>> ]$ iostat
>> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002)     03/10/2013 _x86_64_
>>   (16 CPU)
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>             1.86    0.00    0.96    0.78    0.00   96.41
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read Blk_wrtn
>> sda               1.23        20.26        23.53  521533196 605566924
>> sdb               6.51       921.55       241.90 23717850730 6225863488
>> sdc               6.22       921.83       236.41 23725181162 6084471192
>> sdd               6.25       925.13       237.26 23810004970 6106357880
>> sde               6.19       913.90       235.60 23521108818 6063722504
>> sdh               6.26       933.08       237.77 24014594546 6119511376