Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS crash upon replication


Copy link to this message
-
Re: RS crash upon replication
2013-05-22 15:31:25,929 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for *
/hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719
*
*
*
*01->[01->02->02]->01*

*Looks like a bunch of cascading failures causing this deep nesting... *
On Wed, May 22, 2013 at 2:09 PM, Amit Mor <[EMAIL PROTECTED]> wrote:

> empty return:
>
> [zk: va-p-zookeeper-01-c:2181(CONNECTED) 10] ls
> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> []
>
>
>
> On Thu, May 23, 2013 at 12:05 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
>
> > Do an "ls" not a get here and give the output ?
> >
> > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> >
> >
> > On Wed, May 22, 2013 at 1:53 PM, [EMAIL PROTECTED] <
> > [EMAIL PROTECTED]> wrote:
> >
> > > [zk: va-p-zookeeper-01-c:2181(CONNECTED) 3] get
> > > /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> > >
> > > cZxid = 0x60281c1de
> > > ctime = Wed May 22 15:11:17 EDT 2013
> > > mZxid = 0x60281c1de
> > > mtime = Wed May 22 15:11:17 EDT 2013
> > > pZxid = 0x60281c1de
> > > cversion = 0
> > > dataVersion = 0
> > > aclVersion = 0
> > > ephemeralOwner = 0x0
> > > dataLength = 0
> > > numChildren = 0
> > >
> > >
> > >
> > > On Wed, May 22, 2013 at 11:49 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > What does this command show you ?
> > > >
> > > > get /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> > > >
> > > > Cheers
> > > >
> > > > On Wed, May 22, 2013 at 1:46 PM, [EMAIL PROTECTED] <
> > > > [EMAIL PROTECTED]> wrote:
> > > >
> > > > > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379
> > > > > [1]
> > > > > [zk: va-p-zookeeper-01-c:2181(CONNECTED) 2] ls
> > > > > /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
> > > > > []
> > > > >
> > > > > I'm on hbase-0.94.2-cdh4.2.1
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > On Wed, May 22, 2013 at 11:40 PM, Varun Sharma <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > Also what version of HBase are you running ?
> > > > > >
> > > > > >
> > > > > > On Wed, May 22, 2013 at 1:38 PM, Varun Sharma <
> [EMAIL PROTECTED]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Basically,
> > > > > > >
> > > > > > > You had va-p-hbase-02 crash - that caused all the replication
> > > related
> > > > > > data
> > > > > > > in zookeeper to be moved to va-p-hbase-01 and have it take over
> > for
> > > > > > > replicating 02's logs. Now each region server also maintains an
> > > > > in-memory
> > > > > > > state of whats in ZK, it seems like when you start up 01, its
> > > trying
> > > > to
> > > > > > > replicate the 02 logs underneath but its failing to because
> that
> > > data
> > > > > is
> > > > > > > not in ZK. This is somewhat weird...
> > > > > > >
> > > > > > > Can you open the zookeepeer shell and do
> > > > > > >
> > > > > > > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379
> > > > > > >
> > > > > > > And give the output ?
> > > > > > >
> > > > > > >
> > > > > > > On Wed, May 22, 2013 at 1:27 PM, [EMAIL PROTECTED] <
> > > > > > > [EMAIL PROTECTED]> wrote:
> > > > > > >
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> This is bad ... and happened twice: I had my replication-slave
> > > > cluster
> > > > > > >> offlined. I performed quite a massive Merge operation on it
> and
> > > > after
> > > > > a
> > > > > > >> couple of hours it had finished and I returned it back online.
> > At
> > > > the
> > > > > > same
> > > > > > >> time, the replication-master RS machines crashed (see first
> > crash
> > > > > > >> http://pastebin.com/1msNZ2tH) with the first exception being:
> > > > > > >>
> > > > > > >> org.apache.zookeeper.KeeperException$NoNodeException:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB