Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> RS crash upon replication


+
amit.mor.mail@...) 2013-05-22, 20:27
+
Varun Sharma 2013-05-22, 20:38
+
Varun Sharma 2013-05-22, 20:40
+
amit.mor.mail@...) 2013-05-22, 20:46
+
Ted Yu 2013-05-22, 20:49
+
amit.mor.mail@...) 2013-05-22, 20:53
+
Varun Sharma 2013-05-22, 21:05
+
Amit Mor 2013-05-22, 21:09
+
Varun Sharma 2013-05-22, 21:16
+
Varun Sharma 2013-05-22, 21:17
+
Varun Sharma 2013-05-22, 21:19
+
Varun Sharma 2013-05-22, 21:20
+
Amit Mor 2013-05-22, 21:22
+
Varun Sharma 2013-05-22, 21:32
+
Amit Mor 2013-05-22, 21:42
+
Amit Mor 2013-05-22, 22:00
+
Himanshu Vashishtha 2013-05-22, 22:02
+
Varun Sharma 2013-05-23, 00:01
Copy link to this message
-
Re: RS crash upon replication
That sounds like a bug for sure. Could you create a jira with logs/znode
dump/steps to reproduce it?

Thanks,
himanshu
On Wed, May 22, 2013 at 5:01 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> It seems I can reproduce this - I did a few rolling restarts and got
> screwed with NoNode exceptions - I am running 0.94.7 which has the fix but
> my nodes don't contain hyphens - nodes are no longer coming back up...
>
> Thanks
> Varun
>
>
> On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha <[EMAIL PROTECTED]
> >wrote:
>
> > I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't have
> it.
> >
> > With hyphens in the name, ReplicationSource gets confused and tried to
> set
> > data in a znode which doesn't exist.
> >
> > Thanks,
> > Himanshu
> >
> >
> > On Wed, May 22, 2013 at 2:42 PM, Amit Mor <[EMAIL PROTECTED]>
> wrote:
> >
> > > yes, indeed - hyphens are part of the host name (annoying legacy stuff
> in
> > > my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was
> > > backported by Cloudera into their flavor of 0.94.2, but
> > > the mysterious occurrence of the percent sign in zkcli (ls
> > >
> > >
> >
> /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
> > > might be a sign for such problem. How deep should my rmr in zkcli (an
> > > example would be most welcomed :) be ? I have no serious problem
> running
> > > copyTable with a time period corresponding to the outage and then to
> > start
> > > the sync back again. One question though, how did it cause a crash ?
> > >
> > >
> > > On Thu, May 23, 2013 at 12:32 AM, Varun Sharma <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > I believe there were cascading failures which got these deep nodes
> > > > containing still to be replicated WAL(s) - I suspect there is either
> > some
> > > > parsing bug or something which is causing the replication source to
> not
> > > > work - also which version are you using - does it have
> > > > https://issues.apache.org/jira/browse/HBASE-8207 - since you use
> > hyphens
> > > > in
> > > > our paths. One way to get back up is to delete these nodes but then
> you
> > > > lose data in these WAL(s)...
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 2:22 PM, Amit Mor <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > >  va-p-hbase-02-d,60020,1369249862401
> > > > >
> > > > >
> > > > > On Thu, May 23, 2013 at 12:20 AM, Varun Sharma <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > Basically
> > > > > >
> > > > > > ls /hbase/rs and what do you see for va-p-02-d ?
> > > > > >
> > > > > >
> > > > > > On Wed, May 22, 2013 at 2:19 PM, Varun Sharma <
> [EMAIL PROTECTED]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Can you do ls /hbase/rs and see what you get for 02-d - instead
> > of
> > > > > > looking
> > > > > > > in /replication/, could you look in /hbase/replication/rs - I
> > want
> > > to
> > > > > see
> > > > > > > if the timestamps are matching or not ?
> > > > > > >
> > > > > > > Varun
> > > > > > >
> > > > > > >
> > > > > > > On Wed, May 22, 2013 at 2:17 PM, Varun Sharma <
> > [EMAIL PROTECTED]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> I see - so looks okay - there's just a lot of deep nesting in
> > > there
> > > > -
> > > > > if
> > > > > > >> you look into these you nodes by doing ls - you should see a
> > bunch
> > > > of
> > > > > > >> WAL(s) which still need to be replicated...
> > > > > > >>
> > > > > > >> Varun
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, May 22, 2013 at 2:16 PM, Varun Sharma <
> > > [EMAIL PROTECTED]
> > > > > > >wrote:
> > > > > > >>
> > > > > > >>> 2013-05-22 15:31:25,929 WARN
> > > > > > >>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > Possibly
> > > > > > transient
> > > > > > >>> ZooKeeper exception:
> > > > > > >>> org.apache.zookeeper.KeeperException$SessionExpiredException:
+
Varun Sharma 2013-05-23, 07:33
+
Amit Mor 2013-05-23, 08:17
+
Jean-Daniel Cryans 2013-05-23, 16:48
+
Varun Sharma 2013-05-23, 16:53
+
Amit Mor 2013-05-22, 21:15
+
Amit Mor 2013-05-23, 17:43
+
Amit Mor 2013-05-23, 18:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB