Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> ZOOKEEPER-1059 Was: Does the rolling-restart.sh script work?


+
Ted Yu 2012-03-20, 13:57
+
Patrick Hunt 2012-03-20, 16:16
Copy link to this message
-
Re: ZOOKEEPER-1059 Was: Does the rolling-restart.sh script work?
Patrick:
Appreciate your detailed response.

I haven't finished work in ZOOKEEPER-1407 :-(
So I don't think I have bandwidth to start working on another zookeeper
issue.

Near term, if we can find out a way for shell script to detect the absence
of particular zookeeper node, rolling-restart.sh can be restored.
Otherwise we may need to remove it.

FYI As hbase committer, I often need to finish incomplete features such as
HBASE-3996.
This takes away significant amount of time.

Cheers

On Tue, Mar 20, 2012 at 9:16 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:

> On Tue, Mar 20, 2012 at 6:57 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > I looked at the patch for ZOOKEEPER-1059 which should have converted the
> > NPE to KeeperException.NoNodeException
> >
> > Why would 'zkcli stat' command return 0 in case hbase master znode
> expires ?
> >
> > Advice is appreciated.
>
> Hi Ted, sorry to see you're having troubles. I think I see the
> disconnect. ZooKeeperMain is first and foremost a user shell. As such
> it should not exit unless the quit command is run (or killed
> explicitly, etc...). In this case ZOOKEEPER-1059 is fixing a bug in
> the shell. It indeed is converting the NPE into a NoNodeException,
> which the shell then converts into an error message to the user, and
> continues. Prior to this patch the shell was failing on the NPE, which
> then generated the non-0 exit from the process.
>
> Note that trunk has some further improvements along these lines that
> you might also run into at some point in the future (3.5+):
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-271
> https://issues.apache.org/jira/browse/ZOOKEEPER-1391
> https://issues.apache.org/jira/browse/ZOOKEEPER-1307
>
> I think what we need is to have a tool that's intended for use both
> programmatically and by humans, with more strict requirements about
> input, output formatting and command handling, etc... Please see the
> work Hartmut has been doing as part of 271 on trunk (3.5.0). Perhaps
> we can augment these new classes to also support such a tool. However
> it should instead be a true command line tool, rather than an shell.
> Would you be available to work on this?
>
> Patrick
>
> ps. bigtop is now helping to verify cross project compatibility, it
> would be great if you could introduce some hbase tests  that would
> flag these breakages in future. When bigtop does it's integration (ie
> runs the hbase tests using the corresponding version of zk) it would
> find these problems. We'd catch it much earlier. Thanks!
>
>
> > FYI Jon filed a JIRA for the issue below which is a blocker for HBase
> trunk.
> >
> > On Tue, Mar 20, 2012 at 12:36 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> wrote:
> >
> >> I'm trying to test HBASE-5589 -- to see if I can add an API call to
> >> HMasterInterface and do a rolling-restart / upgrade on a live cluster
> which
> >> lead me down another rabbit hole.
> >>
> >> I'm wondering how rolling-restart.sh script worked in the past (I can
> spend
> >> more time setting up an older version to test this, but figured I'd
> ask).
> >>
> >> I'm getting stuck when the bin/rolling-restart.sh tries to wait until
> the
> >> Master ZNode expires.  In this particular case, the script seems to hang
> >> there forever (even after the /hbase/master ephemeral node expires).
> >>
> >> Here's the code in the script:
> >> ----
> >> # make sure the master znode has been deleted before continuing
> >>    zparent=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
> >> zookeeper.znode.parent`
> >>    if [ "$zparent" == "null" ]; then zparent="/hbase"; fi
> >>    zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
> >> zookeeper.znode.master`
> >>    if [ "$zmaster" == "null" ]; then zmaster="master"; fi
> >>    zmaster=$zparent/$zmaster
> >>    echo -n "Waiting for Master ZNode ${zmaster} to expire"
> >>    while bin/hbase zkcli stat $zmaster >/dev/null 2>&1; do
> >>      echo -n "."
> >>      sleep 1
> >>    done
> >>    echo #force a newline
+
Patrick Hunt 2012-03-20, 16:42
+
Ted Yu 2012-03-20, 17:09
+
Patrick Hunt 2012-03-20, 17:14
+
Ted Yu 2012-03-20, 22:33
+
Patrick Hunt 2012-03-21, 05:49