Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Root Region Not Online after rolling RS Restart


Copy link to this message
-
Re: Root Region Not Online after rolling RS Restart
Time Less 2013-03-18, 16:39
I do thank you for the advice, and I will try it. Is there a quick two- or
three-sentence summary about why this is the proper order?

I would have thought since the -ROOT- and .META. are on RS, that you'd want
to stop the master first before stopping the RS. Perhaps I'm thinking of
services incorrectly, but I always imagine that a supporting function
should be stopped after the function that it supports. For example, close
all files before unmounting filesystem. Unmount all filesystems before
powering down.

Thus, perhaps I'm misunderstanding the dependencies between RS and HMaster.
Is HMaster supporting RS or vice-versa?
On Fri, Mar 15, 2013 at 12:43 AM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> Can you do one thing.
> When you stop the services do this way
> -> Stop the RS
> -> Then stop the master.
>
> That is always  better i feel.
>
> REgards
> Ram
>
> On Fri, Mar 15, 2013 at 6:49 AM, Time Less <[EMAIL PROTECTED]> wrote:
>
> > We have a 15-node HBase cluster with RS on same nodes as HDFS DN. We do a
> > full restart of HBase[1]. Sometimes this works. But sometimes several of
> > the RS have this in their logs:
> >
> > """
> > regionserverHostname: 2013-03-12 16:48:03,396 DEBUG
> >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > locateRegionInMeta parentTable=-ROOT-,
> > metaLocation={region=-ROOT-,,0.70236052, hostname=hbaseMasterHostname,
> > port=60020}, attempt=25 of 100 failed; retrying after sleep of 32000
> > because: org.apache.hadoop.hbase.NotServingRegionException: Region is not
> > online: -ROOT-,,0"
> > """
> >
> > The HMaster will be failing to find -ROOT- region[2] and will be stalled
> > starting up.
> >
> > The above counter from the logs will continue to increment to attempt
> > 100/100, then go back down to attempt 1/100 again. This will continue
> > forever until we delete the stale ZK entry /hbase/root-region-server. As
> > soon as we do, all RS get back to normal, HBase Master comes up, and life
> > is good.
> >
> > I searched JIRA and mailing lists and didn't find what appeared to be a
> > precise match. Does anyone have matching experience?
> >
> > HBase version: 0.92.1 (CDH4).
> >
> > [1] Stop Thrift. Stop HBase Master. Stop all RS. Stop Zookeeper. Reverse
> > this order for starting.
> > [2] I forget the precise verbiage from the HBase web UI. I will discover
> it
> > next time this happens if it's important, but it seems rather generic.
> >
> > --
> > *Tim Ellis: *Fifth Sigma, Inc. Multimedia and Technology++
> >
>

--
*Tim Ellis: *Fifth Sigma, Inc. Multimedia and Technology++
*Contact: *[EMAIL PROTECTED], 510-761-6610
*Urgent Contact:* [EMAIL PROTECTED] (gtalk preferred. if email, CC
no-one)