Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RegionServers Crashing every hour in production env


Copy link to this message
-
Re: RegionServers Crashing every hour in production env
Hi Sreepathi,
they say in the book (or the site), we could try it to see if it is
really a timeout error
or there is something more. But it is not recomended for production
environments.

I could give it a try if five minutes will ensure to us that the problem
is the GC or
elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30
minutes.

Abs,
Pablo

On 03/10/2013 04:06 PM, Sreepathi wrote:
> Hi Stack/Ted/Pablo,
>
> Should we increase the hbase.rpc.timeout property to 5 minutes ( 300000 ms
> )  ?
>
> Regards,
> - Sreepathi
>
> On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa <[EMAIL PROTECTED]> wrote:
>
>>> That combo should be fine.
>> Great!!
>>
>>
>>> If JVM is full GC'ing, the application is stopped.
>>> The below does not look like a full GC but that is a long pause in system
>>> time, enough to kill your zk session.
>> Exactly. This pause is really making the zk expire the RS which shutsdown
>> (logs
>> in the end of the email).
>> But the question is: what is causing this pause??!!
>>
>>> You swapping?
>> I don't think so (stats below).
>>
>>> Hardware is good?
>> Yes, it is a 16 processor machine with 74GB of RAM and plenty disk space.
>> Below are some metrics I have heard about. Hope it helps.
>>
>>
>> ** I am having some problems with the datanodes[1] which are having
>> trouble to
>> write. I really think the issues are related, but cannot solve any of them
>> :(
>>
>> Thanks again,
>> Pablo
>>
>> [1] http://mail-archives.apache.**org/mod_mbox/hadoop-hdfs-user/**
>> 201303.mbox/%3CCAJzooYfS-F1KS+**jGOPUt15PwFjcCSzigE0APeM9FXaCr**
>> [EMAIL PROTECTED]%3E<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201303.mbox/%3CCAJzooYfS-F1KS+[EMAIL PROTECTED]%3E>
>>
>> top - 15:38:04 up 297 days, 21:03,  2 users,  load average: 4.34, 2.55,
>> 1.28
>> Tasks: 528 total,   1 running, 527 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi, 0.0%si,
>>   0.0%st
>> Mem:  74187256k total, 29493992k used, 44693264k free,  5836576k buffers
>> Swap: 51609592k total,   128312k used, 51481280k free,  1353400k cached
>>
>> ]$ vmstat -w
>> procs -------------------memory-----**------------- ---swap-- -----io----
>> --system-- -----cpu-------
>>   r  b       swpd       free       buff      cache   si   so    bi bo   in
>>    cs  us sy  id wa st
>>   2  0     128312   32416928    5838288    5043560    0    0   202 53    0
>>     0   2  1  96  1  0
>>
>> ]$ sar
>> 02:20:01 PM     all     26.18      0.00      2.90      0.63 0.00     70.29
>> 02:30:01 PM     all      1.66      0.00      1.25      1.05 0.00     96.04
>> 02:40:01 PM     all     10.01      0.00      2.14      0.75 0.00     87.11
>> 02:50:01 PM     all      0.76      0.00      0.80      1.03 0.00     97.40
>> 03:00:01 PM     all      0.23      0.00      0.30      0.71 0.00     98.76
>> 03:10:01 PM     all      0.22      0.00      0.30      0.66 0.00     98.82
>> 03:20:01 PM     all      0.22      0.00      0.31      0.76 0.00     98.71
>> 03:30:01 PM     all      0.24      0.00      0.31      0.64 0.00     98.81
>> 03:40:01 PM     all      1.13      0.00      2.97      1.18 0.00     94.73
>> Average:        all      3.86      0.00      1.38      0.88 0.00     93.87
>>
>> ]$ iostat
>> Linux 2.6.32-220.7.1.el6.x86_64 (PSLBHDN002)     03/10/2013 _x86_64_
>>   (16 CPU)
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>             1.86    0.00    0.96    0.78    0.00   96.41
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read Blk_wrtn
>> sda               1.23        20.26        23.53  521533196 605566924
>> sdb               6.51       921.55       241.90 23717850730 6225863488
>> sdc               6.22       921.83       236.41 23725181162 6084471192
>> sdd               6.25       925.13       237.26 23810004970 6106357880
>> sde               6.19       913.90       235.60 23521108818 6063722504
>> sdh               6.26       933.08       237.77 24014594546 6119511376
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB