Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> help why do my regionservers shut themselves down?


Copy link to this message
-
Re: help why do my regionservers shut themselves down?
thanks everyone for responding.

No I don't have the GC logs. I don't even know how i can get that. but
it seems that the regionserver did recovere from that and then gets into
trouble here:

2013-04-22 16:47:56,830 INFO
org.apache.hadoop.hbase.regionserver.HRegion: compaction interrupted by
user:
java.io.InterruptedIOException: Aborting compaction of store f in region
t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
because user requested stop.

the part that I don't understand is what it means when it say
"compaction interrupted by user"!

and to answer your question ted, I am using 0.90.6 over hadoop 1.1.1 ( i
can't upgrade since gora so far only works with .90.x ) and no
everything was normal as far as I could say the map jobs were staggering
since, i assume, the hbase became unresponsive  ( the web interface
start showing exception and that is how i figured out that that
regionserver was down) , while i was restarting this one ( through the
status command in shell ) i noticed that two more regionserver went down
( with identicall error , the second one, not the one about GC pause )
but once I restarted the regionservers (using hbase-daemon.sh)  
everything went back to normal.  but this keeps happening and as a
result i can't left my jobs unsupervised.

thanks,

On 04/22/2013 07:35 PM, Ted Yu wrote:
> Kaveh:
> What version of HBase are you using ?
> Around 2013-04-22 16:47:56, did you observe anything else happening in your
> cluster ? See below:
>
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.**regionserver.HRegion:
> compaction interrupted by user:
> java.io.**InterruptedIOException: Aborting compaction of store f in region
> t1_webpage,com.pandora.www:**http/shaggy,1366670139658.**9f565d5
> da3468c0725e590dc232abc**23. because user requested stop.
>          at org.apache.hadoop.hbase.**regionserver.Store.compact(**Store.
> java:998)
>          at org.apache.hadoop.hbase.**regionserver.Store.compact(**Store.
> java:779)
>          at org.apache.hadoop.hbase.**regionserver.HRegion.**compactStores(
> HRegion.java:**776)
>
> On Mon, Apr 22, 2013 at 6:46 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Kaveh,
>>
>> the respons is maybe already displayed on the logs you sent ;)
>>
>> "This disconnect could have been caused by a network partition or a
>> long-running GC pause, either way it's recommended that you verify
>> your environment."
>>
>> Do you have GC logs? Have you tried anything to solve that?
>>
>> JM
>>
>> 2013/4/22 kaveh minooie <[EMAIL PROTECTED]>:
>>> Hi
>>>
>>> after a few mapreduce jobs my regionservers shut themselves down. this is
>>> the latest time that this has happened:
>>>
>>> 2013-04-22 16:47:21,843 INFO
>>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>>> This client just lost it's session with ZooKeeper, trying to reconnect.
>>> 2013-04-22 16:47:21,843 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> server
>>> serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
>>> 392, regions=196, usedHeap=1063, maxHeap=3966):
>>> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
>>> regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired
>> fr
>>> om ZooKeeper, aborting
>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>> KeeperErrorCode = Session expired
>>>          at
>>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>>>          at
>>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>>>          at
>>>
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
>>>          at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
>>> 2013-04-22 16:47:21,843 INFO
>>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>>> Trying to reconnect to zookeeper.