Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> help why do my regionservers shut themselves down?


+
kaveh minooie 2013-04-23, 01:25
+
Leonid Fedotov 2013-04-23, 15:59
+
Jean-Marc Spaggiari 2013-04-23, 01:46
+
Ted Yu 2013-04-23, 02:35
Copy link to this message
-
Re: help why do my regionservers shut themselves down?
thanks everyone for responding.

No I don't have the GC logs. I don't even know how i can get that. but
it seems that the regionserver did recovere from that and then gets into
trouble here:

2013-04-22 16:47:56,830 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction interrupted by
user:
java.io.InterruptedIOException: Aborting compaction of store f in region
t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
because user requested stop.

the part that I don't understand is what it means when it say
"compaction interrupted by user"!

and to answer your question ted, I am using 0.90.6 over hadoop 1.1.1 ( i
can't upgrade since gora so far only works with .90.x ) and no
everything was normal as far as I could say the map jobs were staggering
since, i assume, the hbase became unresponsive  ( the web interface
start showing exception and that is how i figured out that that
regionserver was down) , while i was restarting this one ( through the
status command in shell ) i noticed that two more regionserver went down
( with identicall error , the second one, not the one about GC pause )
but once I restarted the regionservers (using hbase-daemon.sh)  
everything went back to normal.  but this keeps happening and as a
result i can't left my jobs unsupervised.

thanks,

On 04/22/2013 07:35 PM, Ted Yu wrote:
> Kaveh:
> What version of HBase are you using ?
> Around 2013-04-22 16:47:56, did you observe anything else happening in your
> cluster ? See below:
>
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.**regionserver.HRegion:
> compaction interrupted by user:
> java.io.**InterruptedIOException: Aborting compaction of store f in region
> t1_webpage,com.pandora.www:**http/shaggy,1366670139658.**9f565d5
> da3468c0725e590dc232abc**23. because user requested stop.
>          at org.apache.hadoop.hbase.**regionserver.Store.compact(**Store.
> java:998)
>          at org.apache.hadoop.hbase.**regionserver.Store.compact(**Store.
> java:779)
>          at org.apache.hadoop.hbase.**regionserver.HRegion.**compactStores(
> HRegion.java:**776)
>
> On Mon, Apr 22, 2013 at 6:46 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Kaveh,
>>
>> the respons is maybe already displayed on the logs you sent ;)
>>
>> "This disconnect could have been caused by a network partition or a
>> long-running GC pause, either way it's recommended that you verify
>> your environment."
>>
>> Do you have GC logs? Have you tried anything to solve that?
>>
>> JM
>>
>> 2013/4/22 kaveh minooie <[EMAIL PROTECTED]>:
>>> Hi
>>>
>>> after a few mapreduce jobs my regionservers shut themselves down. this is
>>> the latest time that this has happened:
>>>
>>> 2013-04-22 16:47:21,843 INFO
>>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>>> This client just lost it's session with ZooKeeper, trying to reconnect.
>>> 2013-04-22 16:47:21,843 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> server
>>> serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
>>> 392, regions=196, usedHeap=1063, maxHeap=3966):
>>> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
>>> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired
>> fr
>>> om ZooKeeper, aborting
>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>> KeeperErrorCode = Session expired
>>>          at
>>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>>>          at
>>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>>>          at
>>>
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
>>>          at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
>>> 2013-04-22 16:47:21,843 INFO
>>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>>> Trying to reconnect to zookeeper.
+
Kevin Odell 2013-04-23, 10:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB