Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS crash upon replication


Copy link to this message
-
Re: RS crash upon replication
But wouldn't a copy table b/w timestamps bring you back since the mutations
are all timestamp based we should okay ? Basically doing a copy table which
supersedes the downtime interval ?
On Thu, May 23, 2013 at 9:48 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> fwiw stop_replication is a kill switch, not a general way to start and
> stop replicating, and start_replication may put you in an inconsistent
> state:
>
> hbase(main):001:0> help 'stop_replication'
> Stops all the replication features. The state in which each
> stream stops in is undetermined.
> WARNING:
> start/stop replication is only meant to be used in critical load
> situations.
>
> On Thu, May 23, 2013 at 1:17 AM, Amit Mor <[EMAIL PROTECTED]> wrote:
> > No the server came out fine just because after the crash (RS's - the
> > masters were still running), I immediately pulled the breaks with
> > stop_replication. Then I start the RS's and they came back fine (not
> > replicating). Once I hit 'start_replication' again they had crashed for
> the
> > second time. Eventually I deleted the heavily nested replication znodes
> and
> > the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH
> > with Cloudera Manager Parcels thing and I'm still trying to figure out
> how
> > to replace their jars with mine in a clean and non intrusive way
> >
> >
> > On Thu, May 23, 2013 at 10:33 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >
> >> Actually, it seems like something else was wrong here - the servers
> came up
> >> just fine on trying again - so could not really reproduce the issue.
> >>
> >> Amit: Did you try patching 8207 ?
> >>
> >> Varun
> >>
> >>
> >> On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha <
> [EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > That sounds like a bug for sure. Could you create a jira with
> logs/znode
> >> > dump/steps to reproduce it?
> >> >
> >> > Thanks,
> >> > himanshu
> >> >
> >> >
> >> > On Wed, May 22, 2013 at 5:01 PM, Varun Sharma <[EMAIL PROTECTED]>
> >> wrote:
> >> >
> >> > > It seems I can reproduce this - I did a few rolling restarts and got
> >> > > screwed with NoNode exceptions - I am running 0.94.7 which has the
> fix
> >> > but
> >> > > my nodes don't contain hyphens - nodes are no longer coming back
> up...
> >> > >
> >> > > Thanks
> >> > > Varun
> >> > >
> >> > >
> >> > > On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha <
> >> [EMAIL PROTECTED]
> >> > > >wrote:
> >> > >
> >> > > > I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't
> >> have
> >> > > it.
> >> > > >
> >> > > > With hyphens in the name, ReplicationSource gets confused and
> tried
> >> to
> >> > > set
> >> > > > data in a znode which doesn't exist.
> >> > > >
> >> > > > Thanks,
> >> > > > Himanshu
> >> > > >
> >> > > >
> >> > > > On Wed, May 22, 2013 at 2:42 PM, Amit Mor <
> [EMAIL PROTECTED]>
> >> > > wrote:
> >> > > >
> >> > > > > yes, indeed - hyphens are part of the host name (annoying legacy
> >> > stuff
> >> > > in
> >> > > > > my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if
> 0.94.6
> >> was
> >> > > > > backported by Cloudera into their flavor of 0.94.2, but
> >> > > > > the mysterious occurrence of the percent sign in zkcli (ls
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
> >> > > > > might be a sign for such problem. How deep should my rmr in
> zkcli
> >> (an
> >> > > > > example would be most welcomed :) be ? I have no serious problem
> >> > > running
> >> > > > > copyTable with a time period corresponding to the outage and
> then
> >> to
> >> > > > start
> >> > > > > the sync back again. One question though, how did it cause a
> crash
> >> ?
> >> > > > >
> >> > > > >
> >> > > > > On Thu, May 23, 2013 at 12:32 AM, Varun Sharma <
> >> [EMAIL PROTECTED]>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB