Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Under Heavy Write Load + Replication On : Brings All My Region Servers Dead


Copy link to this message
-
Re: Under Heavy Write Load + Replication On : Brings All My Region Servers Dead
Hello Ameya,

Sorry to hear that.

You have two options:

1) Apply HBase-8099 patch to your version. (
https://issues.apache.org/jira/browse/HBASE-8099) The patch is simple, so
should be easy to do, OR,
2) Turn off zk.multi feature (see hbase-default.xml). (You can refer to
CDH4.2.0 docs for that)

This fix (HBase-8099) will be in CDH4.2.1, though.

Please ask list if you have any more questions.

Thanks,
Himanshu

On Wed, Apr 17, 2013 at 10:38 PM, Ameya Kantikar <[EMAIL PROTECTED]> wrote:

> I am running Hbase 0.94.2 from cloudera cdh4.2. (10 machine cluster)
>
> Under heavy write load, and when replication is on, all my region servers
> are going down.
> I checked with cloudera version, it has HBASE-2611 bug patched in the
> version I am using, so not sure whats going on. Here is the stack:
>
> 2013-04-18 01:47:33,423 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
> Atomically moving relevance-hbase5-snc1.snc1,60020,1366247910200's hlogs to
> my queue
>
> 2013-04-18 01:47:33,424 DEBUG
> org.apache.hadoop.hbase.replication.ReplicationZookeeper:  The multi list
> size is: 1
>
> 2013-04-18 01:47:33,425 WARN
> org.apache.hadoop.hbase.replication.ReplicationZookeeper: Got exception in
> copyQueuesFromRSUsingMulti:
>
> org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode > Directory not empty
>
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
>
>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:925)
>
>         at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:901)
>
>         at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:538)
>
>         at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1457)
>
>         at
>
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:705)
>
>         at
>
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:585)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>         at java.lang.Thread.run(Thread.java:662)
>
>
> Followed by
>
> 2013-04-18 01:47:36,043 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> relevance-hbase2-snc1.snc1,60020,1366247745434: Writing replication status
>
>
> I checked by turning replication off, and everything seems fine. I can
> reproduce this bug almost every time I run my write heavy job.
>
>
> Here is the complete log:
>
> http://pastebin.com/da0m475T
>
>
>
> Any ideas?
>
>
> Ameya
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB