Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Under Heavy Write Load + Replication On : Brings All My Region Servers Dead


+
Ameya Kantikar 2013-04-18, 05:38
+
Himanshu Vashishtha 2013-04-18, 05:48
Copy link to this message
-
Re: Under Heavy Write Load + Replication On : Brings All My Region Servers Dead
Awesome. Thanks Himanshu.
On Wed, Apr 17, 2013 at 10:48 PM, Himanshu Vashishtha <
[EMAIL PROTECTED]> wrote:

> Hello Ameya,
>
> Sorry to hear that.
>
> You have two options:
>
> 1) Apply HBase-8099 patch to your version. (
> https://issues.apache.org/jira/browse/HBASE-8099) The patch is simple, so
> should be easy to do, OR,
> 2) Turn off zk.multi feature (see hbase-default.xml). (You can refer to
> CDH4.2.0 docs for that)
>
> This fix (HBase-8099) will be in CDH4.2.1, though.
>
> Please ask list if you have any more questions.
>
> Thanks,
> Himanshu
>
> On Wed, Apr 17, 2013 at 10:38 PM, Ameya Kantikar <[EMAIL PROTECTED]>
> wrote:
>
> > I am running Hbase 0.94.2 from cloudera cdh4.2. (10 machine cluster)
> >
> > Under heavy write load, and when replication is on, all my region servers
> > are going down.
> > I checked with cloudera version, it has HBASE-2611 bug patched in the
> > version I am using, so not sure whats going on. Here is the stack:
> >
> > 2013-04-18 01:47:33,423 INFO
> >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
> > Atomically moving relevance-hbase5-snc1.snc1,60020,1366247910200's hlogs
> to
> > my queue
> >
> > 2013-04-18 01:47:33,424 DEBUG
> > org.apache.hadoop.hbase.replication.ReplicationZookeeper:  The multi list
> > size is: 1
> >
> > 2013-04-18 01:47:33,425 WARN
> > org.apache.hadoop.hbase.replication.ReplicationZookeeper: Got exception
> in
> > copyQueuesFromRSUsingMulti:
> >
> > org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode > > Directory not empty
> >
> >         at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
> >
> >         at
> org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:925)
> >
> >         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:901)
> >
> >         at
> >
> >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:538)
> >
> >         at
> >
> >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1457)
> >
> >         at
> >
> >
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:705)
> >
> >         at
> >
> >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:585)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >
> >         at java.lang.Thread.run(Thread.java:662)
> >
> >
> > Followed by
> >
> > 2013-04-18 01:47:36,043 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server
> > relevance-hbase2-snc1.snc1,60020,1366247745434: Writing replication
> status
> >
> >
> > I checked by turning replication off, and everything seems fine. I can
> > reproduce this bug almost every time I run my write heavy job.
> >
> >
> > Here is the complete log:
> >
> > http://pastebin.com/da0m475T
> >
> >
> >
> > Any ideas?
> >
> >
> > Ameya
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB