Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - RegionServerSnapshotManager shotdown problem


Copy link to this message
-
Re: RegionServerSnapshotManager shotdown problem
Richard Ding 2013-03-07, 01:47
Thanks Ted for the quick solution.
On Wed, Mar 6, 2013 at 5:25 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Richard:
> If you can try out the fix from HBASE-8019, that would be great.
>
> Meanwhile, I will run the fix through 0.94 test suite.
>
> Cheers
>
> On Wed, Mar 6, 2013 at 5:19 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Looks like the fix from HBASE-7779 wasn't included.
> > See:
> > https://issues.apache.org/jira/secure/attachment/12568663/7779-v2.txt
> >
> > I have created HBASE-8019 for this issue.
> >
> > Thanks for reporting.
> >
> >
> > On Wed, Mar 6, 2013 at 5:04 PM, Richard Ding <[EMAIL PROTECTED]> wrote:
> >
> >> While trying the snapshot code in HBase 0.94 branch (should be the same
> as
> >> 0.94.6RC0), we encountered the problem that HBase region servers take
> long
> >> time to shutdown (see the log below). This problem, however, doesn't
> exist
> >> in 0.94.5. It looks like in RegionServerSnapshotManager.stop() method,
> the
> >> ZK session is closed. This results in SessionExpiredException when
> >> HRegionServer tries to delete MyEphemeralNode.
> >> ... ...
> >> 2013-03-06 11:53:19,767 INFO org.apache.hadoop.hbase.util.RetryCounter:
> >> Sleeping 256000ms before retry #8...
> >> 2013-03-06 11:57:35,806 WARN
> >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
> transient
> >> ZooKeeper exception:
> >> org.apache.zookeeper.KeeperException$SessionExpiredException:
> >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
> >> ,60020,1362529262252
> >> 2013-03-06 11:57:35,806 INFO org.apache.hadoop.hbase.util.RetryCounter:
> >> Sleeping 512000ms before retry #9...
> >> 2013-03-06 12:06:07,882 WARN
> >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
> transient
> >> ZooKeeper exception:
> >> org.apache.zookeeper.KeeperException$SessionExpiredException:
> >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
> >> ,60020,1362529262252
> >> 2013-03-06 12:06:07,882 INFO org.apache.hadoop.hbase.util.RetryCounter:
> >> Sleeping 1024000ms before retry #10...
> >> 2013-03-06 12:23:12,034 WARN
> >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
> transient
> >> ZooKeeper exception:
> >> org.apache.zookeeper.KeeperException$SessionExpiredException:
> >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
> >> ,60020,1362529262252
> >> 2013-03-06 12:23:12,034 ERROR
> >> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper delete
> >> failed after 10 retries
> >> 2013-03-06 12:23:12,034 WARN
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: Failed deleting my
> >> ephemeral node
> >> org.apache.zookeeper.KeeperException$SessionExpiredException:
> >> KeeperErrorCode = Session expired for /hbase/rs/hdtest010.svl.ibm.com
> >> ,60020,1362529262252
> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:133)
> >> at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:999)
> >> at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:988)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1097)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:875)
> >> at java.lang.Thread.run(Thread.java:738)
> >> 2013-03-06 12:23:12,036 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> >> hdtest010.svl.ibm.com,60020,1362529262252; zookeeper connection closed.
> >> 2013-03-06 12:23:12,036 INFO
> >> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
> >> exiting
> >> 2013-03-06 12:23:12,039 INFO
> >> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> starting;