Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: Region server deadlocks in master master replication


Copy link to this message
-
Re: Region server deadlocks in master master replication
Hi Jean,

I looked at the release notes for 0.94.1 and 0.94.2 and it looks like all
the fixes there have to do with splitting of regions (I maybe wrong). For
my cluster(s), splits are off.

Varun

On Fri, Nov 30, 2012 at 10:03 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Hi Jean,
>
> Thanks ! Could you point me to some of the fixes ? We currently use
> hbase-0.94.0 with some other patches.
>
> On Fri, Nov 30, 2012 at 8:53 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>
>> Use 0.94.2, it has all the fixes you need.
>>
>> J-D
>>
>> On Fri, Nov 30, 2012 at 4:56 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>>
>> > After clearing out some files in /.logs which had size 0 and restarting
>> the
>> > cluster - all regions came online and starting serving. But now I am
>> again
>> > stuck. The master moved some regions to rebalance after the restart and
>> > some of them are PENDING_CLOSE while 2 regions are offline. Again all
>> PRI
>> > handlers are stuck in replicateLogEntries() - looking at the region
>> server
>> > status page. Moreover jstack shows that these are stuck on
>> > locateRegionInMeta. The other handlers are waiting as normal. Also there
>> > are 0 byte files now under ./logs  -not sure if these are causing the
>> > issues...
>> >
>> > Thanks !
>> >
>> > On Fri, Nov 30, 2012 at 3:46 AM, Varun Sharma <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I have a master master replication setup with hbase 0.94.0 - if only
>> > write
>> > > to cluster A and replication carries over the data to cluster B. I am
>> > > having some really weird issues with cluster B. Basically, all the
>> > Priority
>> > > RPC handlers are stuck in calls in replicateLogEntries while all the
>> > normal
>> > > RPC handlers are just waiting on each region server.
>> > >
>> > > From the logs I could see the following:
>> > >
>> > > 1) Region server shutdown
>> > > Stopping the region server showed some issues. There were exceptions
>> > > thrown while closing down regions - the exceptions were in the
>> > > localRegionInMeta calls and also while trying to get the value of
>> > > /hbase/root-region-server (I have checked via a manual client,
>> zookeeper
>> > is
>> > > working fine).
>> > >
>> > > 2) jstack traces show that there are issues with locating the META and
>> > the
>> > > ROOT tables
>> > >
>> > > "PRI IPC Server handler 2 on 60020" daemon prio=10
>> tid=0x00007f4ddcd39000
>> > > nid=0x2dbf waiting on condition [0x00007f4dd9edc000]
>> > >    java.lang.Thread.State: TIMED_WAITING (sleeping)
>> > > at java.lang.Thread.sleep(Native Method)
>> > > at
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1046)
>> > >  at
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
>> > > at
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
>> > >  at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
>> > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
>> > >  at
>> > >
>> >
>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
>> > > at
>> > >
>> >
>> org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:268)
>> > >  at
>> > >
>> >
>> org.apache.hadoop.hbase.client.HTablePool.findOrCreateTable(HTablePool.java:198)
>> > > at
>> > org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:173)
>> > >  at
>> > >
>> org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:216)
>> > > at
>> > >
>> >
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:171)
>> > >
>> > > "IPC Server handler 3 on 60020" daemon prio=10 tid=0x00007f4ddcb1d800
>> > > nid=0x2db6 waiting on condition [0x00007f4dda7e6000]
>> > >    java.lang.Thread.State: WAITING (parking)