Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> RS crash upon replication


+
amit.mor.mail@...) 2013-05-22, 20:27
+
Varun Sharma 2013-05-22, 20:38
+
Varun Sharma 2013-05-22, 20:40
+
amit.mor.mail@...) 2013-05-22, 20:46
+
Ted Yu 2013-05-22, 20:49
+
amit.mor.mail@...) 2013-05-22, 20:53
+
Varun Sharma 2013-05-22, 21:05
+
Amit Mor 2013-05-22, 21:09
+
Varun Sharma 2013-05-22, 21:16
+
Varun Sharma 2013-05-22, 21:17
+
Varun Sharma 2013-05-22, 21:19
+
Varun Sharma 2013-05-22, 21:20
+
Amit Mor 2013-05-22, 21:22
+
Varun Sharma 2013-05-22, 21:32
+
Amit Mor 2013-05-22, 21:42
+
Amit Mor 2013-05-22, 22:00
Copy link to this message
-
Re: RS crash upon replication
I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't have it.

With hyphens in the name, ReplicationSource gets confused and tried to set
data in a znode which doesn't exist.

Thanks,
Himanshu
On Wed, May 22, 2013 at 2:42 PM, Amit Mor <[EMAIL PROTECTED]> wrote:

> yes, indeed - hyphens are part of the host name (annoying legacy stuff in
> my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was
> backported by Cloudera into their flavor of 0.94.2, but
> the mysterious occurrence of the percent sign in zkcli (ls
>
> /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
> might be a sign for such problem. How deep should my rmr in zkcli (an
> example would be most welcomed :) be ? I have no serious problem running
> copyTable with a time period corresponding to the outage and then to start
> the sync back again. One question though, how did it cause a crash ?
>
>
> On Thu, May 23, 2013 at 12:32 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
>
> > I believe there were cascading failures which got these deep nodes
> > containing still to be replicated WAL(s) - I suspect there is either some
> > parsing bug or something which is causing the replication source to not
> > work - also which version are you using - does it have
> > https://issues.apache.org/jira/browse/HBASE-8207 - since you use hyphens
> > in
> > our paths. One way to get back up is to delete these nodes but then you
> > lose data in these WAL(s)...
> >
> >
> > On Wed, May 22, 2013 at 2:22 PM, Amit Mor <[EMAIL PROTECTED]>
> wrote:
> >
> > >  va-p-hbase-02-d,60020,1369249862401
> > >
> > >
> > > On Thu, May 23, 2013 at 12:20 AM, Varun Sharma <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Basically
> > > >
> > > > ls /hbase/rs and what do you see for va-p-02-d ?
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 2:19 PM, Varun Sharma <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > Can you do ls /hbase/rs and see what you get for 02-d - instead of
> > > > looking
> > > > > in /replication/, could you look in /hbase/replication/rs - I want
> to
> > > see
> > > > > if the timestamps are matching or not ?
> > > > >
> > > > > Varun
> > > > >
> > > > >
> > > > > On Wed, May 22, 2013 at 2:17 PM, Varun Sharma <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > > >
> > > > >> I see - so looks okay - there's just a lot of deep nesting in
> there
> > -
> > > if
> > > > >> you look into these you nodes by doing ls - you should see a bunch
> > of
> > > > >> WAL(s) which still need to be replicated...
> > > > >>
> > > > >> Varun
> > > > >>
> > > > >>
> > > > >> On Wed, May 22, 2013 at 2:16 PM, Varun Sharma <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > > >>
> > > > >>> 2013-05-22 15:31:25,929 WARN
> > > > >>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
> > > > transient
> > > > >>> ZooKeeper exception:
> > > > >>> org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > > >>> KeeperErrorCode = Session expired for *
> > > > >>>
> > > >
> > >
> >
> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719
> > > > >>> *
> > > > >>> *
> > > > >>> *
> > > > >>> *01->[01->02->02]->01*
> > > > >>>
> > > > >>> *Looks like a bunch of cascading failures causing this deep
> > > nesting...
> > > > *
> > > > >>>
> > > > >>>
> > > > >>> On Wed, May 22, 2013 at 2:09 PM, Amit Mor <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > > >>>
> > > > >>>> empty return:
> > > > >>>>
> > > > >>>> [zk: va-p-zookeeper-01-c:2181(CONNECTED) 10] ls
> > > > >>>> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> > > > >>>> []
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, May 23, 2013 at 12:05 AM, Varun Sharma <
+
Varun Sharma 2013-05-23, 00:01
+
Himanshu Vashishtha 2013-05-23, 00:40
+
Varun Sharma 2013-05-23, 07:33
+
Amit Mor 2013-05-23, 08:17
+
Jean-Daniel Cryans 2013-05-23, 16:48
+
Varun Sharma 2013-05-23, 16:53
+
Amit Mor 2013-05-22, 21:15
+
Amit Mor 2013-05-23, 17:43
+
Amit Mor 2013-05-23, 18:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB