Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS crash upon replication


Copy link to this message
-
Re: RS crash upon replication
Can you do ls /hbase/rs and see what you get for 02-d - instead of looking
in /replication/, could you look in /hbase/replication/rs - I want to see
if the timestamps are matching or not ?

Varun
On Wed, May 22, 2013 at 2:17 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> I see - so looks okay - there's just a lot of deep nesting in there - if
> you look into these you nodes by doing ls - you should see a bunch of
> WAL(s) which still need to be replicated...
>
> Varun
>
>
> On Wed, May 22, 2013 at 2:16 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
>> 2013-05-22 15:31:25,929 WARN
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
>> ZooKeeper exception:
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for *
>> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719
>> *
>> *
>> *
>> *01->[01->02->02]->01*
>>
>> *Looks like a bunch of cascading failures causing this deep nesting... *
>>
>>
>> On Wed, May 22, 2013 at 2:09 PM, Amit Mor <[EMAIL PROTECTED]>wrote:
>>
>>> empty return:
>>>
>>> [zk: va-p-zookeeper-01-c:2181(CONNECTED) 10] ls
>>> /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
>>> []
>>>
>>>
>>>
>>> On Thu, May 23, 2013 at 12:05 AM, Varun Sharma <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>> > Do an "ls" not a get here and give the output ?
>>> >
>>> > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
>>> >
>>> >
>>> > On Wed, May 22, 2013 at 1:53 PM, [EMAIL PROTECTED] <
>>> > [EMAIL PROTECTED]> wrote:
>>> >
>>> > > [zk: va-p-zookeeper-01-c:2181(CONNECTED) 3] get
>>> > > /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
>>> > >
>>> > > cZxid = 0x60281c1de
>>> > > ctime = Wed May 22 15:11:17 EDT 2013
>>> > > mZxid = 0x60281c1de
>>> > > mtime = Wed May 22 15:11:17 EDT 2013
>>> > > pZxid = 0x60281c1de
>>> > > cversion = 0
>>> > > dataVersion = 0
>>> > > aclVersion = 0
>>> > > ephemeralOwner = 0x0
>>> > > dataLength = 0
>>> > > numChildren = 0
>>> > >
>>> > >
>>> > >
>>> > > On Wed, May 22, 2013 at 11:49 PM, Ted Yu <[EMAIL PROTECTED]>
>>> wrote:
>>> > >
>>> > > > What does this command show you ?
>>> > > >
>>> > > > get /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
>>> > > >
>>> > > > Cheers
>>> > > >
>>> > > > On Wed, May 22, 2013 at 1:46 PM, [EMAIL PROTECTED] <
>>> > > > [EMAIL PROTECTED]> wrote:
>>> > > >
>>> > > > > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379
>>> > > > > [1]
>>> > > > > [zk: va-p-zookeeper-01-c:2181(CONNECTED) 2] ls
>>> > > > > /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
>>> > > > > []
>>> > > > >
>>> > > > > I'm on hbase-0.94.2-cdh4.2.1
>>> > > > >
>>> > > > > Thanks
>>> > > > >
>>> > > > >
>>> > > > > On Wed, May 22, 2013 at 11:40 PM, Varun Sharma <
>>> [EMAIL PROTECTED]>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Also what version of HBase are you running ?
>>> > > > > >
>>> > > > > >
>>> > > > > > On Wed, May 22, 2013 at 1:38 PM, Varun Sharma <
>>> [EMAIL PROTECTED]
>>> > >
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > > Basically,
>>> > > > > > >
>>> > > > > > > You had va-p-hbase-02 crash - that caused all the replication
>>> > > related
>>> > > > > > data
>>> > > > > > > in zookeeper to be moved to va-p-hbase-01 and have it take
>>> over
>>> > for
>>> > > > > > > replicating 02's logs. Now each region server also maintains
>>> an
>>> > > > > in-memory
>>> > > > > > > state of whats in ZK, it seems like when you start up 01, its
>>> > > trying
>>> > > > to
>>> > > > > > > replicate the 02 logs underneath but its failing to because
>>> that
>>> > > data
>>> > > > > is
>>> > > > > > > not in ZK. This is somewhat weird...
>>> > > > > > >
>>> > > > > > > Can you open the zookeepeer shell and do
>>> > > > > > >
>>> > > > > > > ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379