Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - RS crash upon replication


Copy link to this message
-
Re: RS crash upon replication
Amit Mor 2013-05-22, 21:42
yes, indeed - hyphens are part of the host name (annoying legacy stuff in
my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was
backported by Cloudera into their flavor of 0.94.2, but
the mysterious occurrence of the percent sign in zkcli (ls
/hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
might be a sign for such problem. How deep should my rmr in zkcli (an
example would be most welcomed :) be ? I have no serious problem running
copyTable with a time period corresponding to the outage and then to start
the sync back again. One question though, how did it cause a crash ?
On Thu, May 23, 2013 at 12:32 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> I believe there were cascading failures which got these deep nodes
> containing still to be replicated WAL(s) - I suspect there is either some
> parsing bug or something which is causing the replication source to not
> work - also which version are you using - does it have
> https://issues.apache.org/jira/browse/HBASE-8207 - since you use hyphens
> in
> our paths. One way to get back up is to delete these nodes but then you
> lose data in these WAL(s)...
>
>
> On Wed, May 22, 2013 at 2:22 PM, Amit Mor <[EMAIL PROTECTED]> wrote:
>
> >  va-p-hbase-02-d,60020,1369249862401
> >
> >
> > On Thu, May 23, 2013 at 12:20 AM, Varun Sharma <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Basically
> > >
> > > ls /hbase/rs and what do you see for va-p-02-d ?
> > >
> > >
> > > On Wed, May 22, 2013 at 2:19 PM, Varun Sharma <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Can you do ls /hbase/rs and see what you get for 02-d - instead of
> > > looking
> > > > in /replication/, could you look in /hbase/replication/rs - I want to
> > see
> > > > if the timestamps are matching or not ?
> > > >
> > > > Varun
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 2:17 PM, Varun Sharma <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > >> I see - so looks okay - there's just a lot of deep nesting in there
> -
> > if
> > > >> you look into these you nodes by doing ls - you should see a bunch
> of
> > > >> WAL(s) which still need to be replicated...
> > > >>
> > > >> Varun
> > > >>
> > > >>
> > > >> On Wed, May 22, 2013 at 2:16 PM, Varun Sharma <[EMAIL PROTECTED]
> > > >wrote:
> > > >>
> > > >>> 2013-05-22 15:31:25,929 WARN
> > > >>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
> > > transient
> > > >>> ZooKeeper exception:
> > > >>> org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > >>> KeeperErrorCode = Session expired for *
> > > >>>
> > >
> >
> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719
> > > >>> *
> > > >>> *
> > > >>> *
> > > >>> *01->[01->02->02]->01*
> > > >>>
> > > >>> *Looks like a bunch of cascading failures causing this deep
> > nesting...
> > > *
> > > >>>
> > > >>>
> > > >>> On Wed, May 22, 2013 at 2:09 PM, Amit Mor <[EMAIL PROTECTED]
> > > >wrote:
> > > >>>
> > > >>>> empty return:
> > > >>>>
> > > >>>> [zk: va-p-zookeeper-01-c:2181(CONNECTED) 10] ls
> > > >>>> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> > > >>>> []
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, May 23, 2013 at 12:05 AM, Varun Sharma <
> [EMAIL PROTECTED]
> > >
> > > >>>> wrote:
> > > >>>>
> > > >>>> > Do an "ls" not a get here and give the output ?
> > > >>>> >
> > > >>>> > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> > > >>>> >
> > > >>>> >
> > > >>>> > On Wed, May 22, 2013 at 1:53 PM, [EMAIL PROTECTED] <
> > > >>>> > [EMAIL PROTECTED]> wrote:
> > > >>>> >
> > > >>>> > > [zk: va-p-zookeeper-01-c:2181(CONNECTED) 3] get
> > > >>>> > > /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1