Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Root Region Not Online after rolling RS Restart


+
Time Less 2013-03-15, 01:19
+
ramkrishna vasudevan 2013-03-15, 07:43
+
Time Less 2013-03-18, 16:39
Copy link to this message
-
Re: Root Region Not Online after rolling RS Restart
Ted Yu 2013-03-18, 17:39
If you look at in/start-hbase.sh, you would see:

 "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" $commandToRun
zookeeper
  "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master
  "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \
    --hosts "${HBASE_REGIONSERVERS}" $commandToRun regionserver

Region servers are started following the start of master. So the order is
reverse when shutting down the cluster.

On Mon, Mar 18, 2013 at 9:39 AM, Time Less <[EMAIL PROTECTED]> wrote:

> I do thank you for the advice, and I will try it. Is there a quick two- or
> three-sentence summary about why this is the proper order?
>
> I would have thought since the -ROOT- and .META. are on RS, that you'd want
> to stop the master first before stopping the RS. Perhaps I'm thinking of
> services incorrectly, but I always imagine that a supporting function
> should be stopped after the function that it supports. For example, close
> all files before unmounting filesystem. Unmount all filesystems before
> powering down.
>
> Thus, perhaps I'm misunderstanding the dependencies between RS and HMaster.
> Is HMaster supporting RS or vice-versa?
>
>
> On Fri, Mar 15, 2013 at 12:43 AM, ramkrishna vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > Can you do one thing.
> > When you stop the services do this way
> > -> Stop the RS
> > -> Then stop the master.
> >
> > That is always  better i feel.
> >
> > REgards
> > Ram
> >
> > On Fri, Mar 15, 2013 at 6:49 AM, Time Less <[EMAIL PROTECTED]>
> wrote:
> >
> > > We have a 15-node HBase cluster with RS on same nodes as HDFS DN. We
> do a
> > > full restart of HBase[1]. Sometimes this works. But sometimes several
> of
> > > the RS have this in their logs:
> > >
> > > """
> > > regionserverHostname: 2013-03-12 16:48:03,396 DEBUG
> > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > > locateRegionInMeta parentTable=-ROOT-,
> > > metaLocation={region=-ROOT-,,0.70236052, hostname=hbaseMasterHostname,
> > > port=60020}, attempt=25 of 100 failed; retrying after sleep of 32000
> > > because: org.apache.hadoop.hbase.NotServingRegionException: Region is
> not
> > > online: -ROOT-,,0"
> > > """
> > >
> > > The HMaster will be failing to find -ROOT- region[2] and will be
> stalled
> > > starting up.
> > >
> > > The above counter from the logs will continue to increment to attempt
> > > 100/100, then go back down to attempt 1/100 again. This will continue
> > > forever until we delete the stale ZK entry /hbase/root-region-server.
> As
> > > soon as we do, all RS get back to normal, HBase Master comes up, and
> life
> > > is good.
> > >
> > > I searched JIRA and mailing lists and didn't find what appeared to be a
> > > precise match. Does anyone have matching experience?
> > >
> > > HBase version: 0.92.1 (CDH4).
> > >
> > > [1] Stop Thrift. Stop HBase Master. Stop all RS. Stop Zookeeper.
> Reverse
> > > this order for starting.
> > > [2] I forget the precise verbiage from the HBase web UI. I will
> discover
> > it
> > > next time this happens if it's important, but it seems rather generic.
> > >
> > > --
> > > *Tim Ellis: *Fifth Sigma, Inc. Multimedia and Technology++
> > >
> >
>
>
>
> --
> *Tim Ellis: *Fifth Sigma, Inc. Multimedia and Technology++
> *Contact: *[EMAIL PROTECTED], 510-761-6610
> *Urgent Contact:* [EMAIL PROTECTED] (gtalk preferred. if email, CC
> no-one)
>