Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> ZOOKEEPER-1059 Was: Does the rolling-restart.sh script work?


Copy link to this message
-
ZOOKEEPER-1059 Was: Does the rolling-restart.sh script work?
I looked at the patch for ZOOKEEPER-1059 which should have converted the
NPE to KeeperException.NoNodeException

Why would 'zkcli stat' command return 0 in case hbase master znode expires ?

Advice is appreciated.

FYI Jon filed a JIRA for the issue below which is a blocker for HBase trunk.

On Tue, Mar 20, 2012 at 12:36 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:

> I'm trying to test HBASE-5589 -- to see if I can add an API call to
> HMasterInterface and do a rolling-restart / upgrade on a live cluster which
> lead me down another rabbit hole.
>
> I'm wondering how rolling-restart.sh script worked in the past (I can spend
> more time setting up an older version to test this, but figured I'd ask).
>
> I'm getting stuck when the bin/rolling-restart.sh tries to wait until the
> Master ZNode expires.  In this particular case, the script seems to hang
> there forever (even after the /hbase/master ephemeral node expires).
>
> Here's the code in the script:
> ----
> # make sure the master znode has been deleted before continuing
>    zparent=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
> zookeeper.znode.parent`
>    if [ "$zparent" == "null" ]; then zparent="/hbase"; fi
>    zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
> zookeeper.znode.master`
>    if [ "$zmaster" == "null" ]; then zmaster="master"; fi
>    zmaster=$zparent/$zmaster
>    echo -n "Waiting for Master ZNode ${zmaster} to expire"
>    while bin/hbase zkcli stat $zmaster >/dev/null 2>&1; do
>      echo -n "."
>      sleep 1
>    done
>    echo #force a newline
> ----
>
> The problem is that 'bin/hbase zkcli stat /hbase/master ...' seems to
> always returns with $? == 0 regardless if the znode is present or not
> present!  I've checked with Patrick Hunt (ZK committer) and this the
> expected behavior.  The only non-zero retcodes are for abnormal exits
> (exceptions thrown)
>
> Here's the ZK code I was looking through
>
> https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeperMain.java#L736
>
>
> https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeper.java#L980
>
>
> Thoughts?
>
> Jon.
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // [EMAIL PROTECTED]
>