Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: Region server deadlocks in master master replication


Copy link to this message
-
Re: Region server deadlocks in master master replication
Varun Sharma 2012-11-30, 18:03
Hi Jean,

Thanks ! Could you point me to some of the fixes ? We currently use
hbase-0.94.0 with some other patches.

On Fri, Nov 30, 2012 at 8:53 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> Use 0.94.2, it has all the fixes you need.
>
> J-D
>
> On Fri, Nov 30, 2012 at 4:56 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
> > After clearing out some files in /.logs which had size 0 and restarting
> the
> > cluster - all regions came online and starting serving. But now I am
> again
> > stuck. The master moved some regions to rebalance after the restart and
> > some of them are PENDING_CLOSE while 2 regions are offline. Again all PRI
> > handlers are stuck in replicateLogEntries() - looking at the region
> server
> > status page. Moreover jstack shows that these are stuck on
> > locateRegionInMeta. The other handlers are waiting as normal. Also there
> > are 0 byte files now under ./logs  -not sure if these are causing the
> > issues...
> >
> > Thanks !
> >
> > On Fri, Nov 30, 2012 at 3:46 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > >
> > > I have a master master replication setup with hbase 0.94.0 - if only
> > write
> > > to cluster A and replication carries over the data to cluster B. I am
> > > having some really weird issues with cluster B. Basically, all the
> > Priority
> > > RPC handlers are stuck in calls in replicateLogEntries while all the
> > normal
> > > RPC handlers are just waiting on each region server.
> > >
> > > From the logs I could see the following:
> > >
> > > 1) Region server shutdown
> > > Stopping the region server showed some issues. There were exceptions
> > > thrown while closing down regions - the exceptions were in the
> > > localRegionInMeta calls and also while trying to get the value of
> > > /hbase/root-region-server (I have checked via a manual client,
> zookeeper
> > is
> > > working fine).
> > >
> > > 2) jstack traces show that there are issues with locating the META and
> > the
> > > ROOT tables
> > >
> > > "PRI IPC Server handler 2 on 60020" daemon prio=10
> tid=0x00007f4ddcd39000
> > > nid=0x2dbf waiting on condition [0x00007f4dd9edc000]
> > >    java.lang.Thread.State: TIMED_WAITING (sleeping)
> > > at java.lang.Thread.sleep(Native Method)
> > > at
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1046)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
> > > at
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> > >  at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
> > > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
> > > at
> > >
> >
> org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:268)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.client.HTablePool.findOrCreateTable(HTablePool.java:198)
> > > at
> > org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:173)
> > >  at
> > > org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:216)
> > > at
> > >
> >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:171)
> > >
> > > "IPC Server handler 3 on 60020" daemon prio=10 tid=0x00007f4ddcb1d800
> > > nid=0x2db6 waiting on condition [0x00007f4dda7e6000]
> > >    java.lang.Thread.State: WAITING (parking)
> > >  at sun.misc.Unsafe.park(Native Method)
> > > - parking to wait for  <0x000000056aa146e8> (a
> > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > >  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > at
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)