Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> RegionServerSnapshotManager shotdown problem


+
Richard Ding 2013-03-07, 01:04
+
Ted Yu 2013-03-07, 01:19
Copy link to this message
-
Re: RegionServerSnapshotManager shotdown problem
Richard:
If you can try out the fix from HBASE-8019, that would be great.

Meanwhile, I will run the fix through 0.94 test suite.

Cheers

On Wed, Mar 6, 2013 at 5:19 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Looks like the fix from HBASE-7779 wasn't included.
> See:
> https://issues.apache.org/jira/secure/attachment/12568663/7779-v2.txt
>
> I have created HBASE-8019 for this issue.
>
> Thanks for reporting.
>
>
> On Wed, Mar 6, 2013 at 5:04 PM, Richard Ding <[EMAIL PROTECTED]> wrote:
>
>> While trying the snapshot code in HBase 0.94 branch (should be the same as
>> 0.94.6RC0), we encountered the problem that HBase region servers take long
>> time to shutdown (see the log below). This problem, however, doesn't exist
>> in 0.94.5. It looks like in RegionServerSnapshotManager.stop() method, the
>> ZK session is closed. This results in SessionExpiredException when
>> HRegionServer tries to delete MyEphemeralNode.
>> ... ...
>> 2013-03-06 11:53:19,767 INFO org.apache.hadoop.hbase.util.RetryCounter:
>> Sleeping 256000ms before retry #8...
>> 2013-03-06 11:57:35,806 WARN
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
>> ZooKeeper exception:
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
>> ,60020,1362529262252
>> 2013-03-06 11:57:35,806 INFO org.apache.hadoop.hbase.util.RetryCounter:
>> Sleeping 512000ms before retry #9...
>> 2013-03-06 12:06:07,882 WARN
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
>> ZooKeeper exception:
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
>> ,60020,1362529262252
>> 2013-03-06 12:06:07,882 INFO org.apache.hadoop.hbase.util.RetryCounter:
>> Sleeping 1024000ms before retry #10...
>> 2013-03-06 12:23:12,034 WARN
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
>> ZooKeeper exception:
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
>> ,60020,1362529262252
>> 2013-03-06 12:23:12,034 ERROR
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete
>> failed after 10 retries
>> 2013-03-06 12:23:12,034 WARN
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my
>> ephemeral node
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
>> ,60020,1362529262252
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>> at
>>
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
>> at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:999)
>> at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:988)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1097)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:875)
>> at java.lang.Thread.run(Thread.java:738)
>> 2013-03-06 12:23:12,036 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
>> hdtest010.svl.ibm.com,60020,1362529262252; zookeeper connection closed.
>> 2013-03-06 12:23:12,036 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
>> exiting
>> 2013-03-06 12:23:12,039 INFO
>> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting;
>> hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-12,5,main]
>> 2013-03-06 12:23:12,039 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook
>> 2013-03-06 12:23:12,039 INFO
>> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
>> hook thread.
+
Richard Ding 2013-03-07, 01:47