Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS crash upon replication


Copy link to this message
-
Re: RS crash upon replication
But wouldn't a copy table b/w timestamps bring you back since the mutations
are all timestamp based we should okay ? Basically doing a copy table which
supersedes the downtime interval ?
On Thu, May 23, 2013 at 9:48 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> fwiw stop_replication is a kill switch, not a general way to start and
> stop replicating, and start_replication may put you in an inconsistent
> state:
>
> hbase(main):001:0> help 'stop_replication'
> Stops all the replication features. The state in which each
> stream stops in is undetermined.
> WARNING:
> start/stop replication is only meant to be used in critical load
> situations.
>
> On Thu, May 23, 2013 at 1:17 AM, Amit Mor <[EMAIL PROTECTED]> wrote:
> > No the server came out fine just because after the crash (RS's - the
> > masters were still running), I immediately pulled the breaks with
> > stop_replication. Then I start the RS's and they came back fine (not
> > replicating). Once I hit 'start_replication' again they had crashed for
> the
> > second time. Eventually I deleted the heavily nested replication znodes
> and
> > the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH
> > with Cloudera Manager Parcels thing and I'm still trying to figure out
> how
> > to replace their jars with mine in a clean and non intrusive way
> >
> >
> > On Thu, May 23, 2013 at 10:33 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >
> >> Actually, it seems like something else was wrong here - the servers
> came up
> >> just fine on trying again - so could not really reproduce the issue.
> >>
> >> Amit: Did you try patching 8207 ?
> >>
> >> Varun
> >>
> >>
> >> On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha <
> [EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > That sounds like a bug for sure. Could you create a jira with
> logs/znode
> >> > dump/steps to reproduce it?
> >> >
> >> > Thanks,
> >> > himanshu
> >> >
> >> >
> >> > On Wed, May 22, 2013 at 5:01 PM, Varun Sharma <[EMAIL PROTECTED]>
> >> wrote:
> >> >
> >> > > It seems I can reproduce this - I did a few rolling restarts and got
> >> > > screwed with NoNode exceptions - I am running 0.94.7 which has the
> fix
> >> > but
> >> > > my nodes don't contain hyphens - nodes are no longer coming back
> up...
> >> > >
> >> > > Thanks
> >> > > Varun
> >> > >
> >> > >
> >> > > On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha <
> >> [EMAIL PROTECTED]
> >> > > >wrote:
> >> > >
> >> > > > I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't
> >> have
> >> > > it.
> >> > > >
> >> > > > With hyphens in the name, ReplicationSource gets confused and
> tried
> >> to
> >> > > set
> >> > > > data in a znode which doesn't exist.
> >> > > >
> >> > > > Thanks,
> >> > > > Himanshu
> >> > > >
> >> > > >
> >> > > > On Wed, May 22, 2013 at 2:42 PM, Amit Mor <
> [EMAIL PROTECTED]>
> >> > > wrote:
> >> > > >
> >> > > > > yes, indeed - hyphens are part of the host name (annoying legacy
> >> > stuff
> >> > > in
> >> > > > > my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if
> 0.94.6
> >> was
> >> > > > > backported by Cloudera into their flavor of 0.94.2, but
> >> > > > > the mysterious occurrence of the percent sign in zkcli (ls
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
> >> > > > > might be a sign for such problem. How deep should my rmr in
> zkcli
> >> (an
> >> > > > > example would be most welcomed :) be ? I have no serious problem
> >> > > running
> >> > > > > copyTable with a time period corresponding to the outage and
> then
> >> to
> >> > > > start
> >> > > > > the sync back again. One question though, how did it cause a
> crash
> >> ?
> >> > > > >
> >> > > > >
> >> > > > > On Thu, May 23, 2013 at 12:32 AM, Varun Sharma <
> >> [EMAIL PROTECTED]>