Our HBase cluster had an unexpected shut-down and while trying to bring it back up we the Master gets stuck with the following message:
Failed splitting of [ list of <host_name>,<port>,<tmst> ] java.io.IOException: error or interrupted while splitting logs in [ list of <host_name>,<port>,<tmst> ] Task = installed = 10 done = 0 error = 10 at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242) at org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396) at java.lang.Thread.run(Thread.java:724)
What can I do to get the cluster operational again. There was no data ingestion going on since quite some hours before the crash so maybe clearing out /hbase/.logs/ could be an option.
We ran into this a few weeks ago when while adding new nodes into an existing cluster. Due to a misconfiguration, the new nodes were assigned a wrong zookeeper quorum, and ended up forming a new cluster. We saw a similar error in our logs:
2014-01-30 16:47:19,196 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_META_SERVER_SHUTDOWN java.io.IOException: failed log splitting for xxxxx.xxx.urbanairship.com,60020,1385165871751, will retry at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:182) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: error or interrupted while splitting logs in [maprfs:/......./xxxx.xxxx.urbanairship.com,60020,1385165871751-splitting] Task = installed = 1 done = 0 error = 1 at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:272) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:284) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:252) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:175) We fixed it by shutting the new nodes down, moving aside the offending logs and restarting the master. Later,we fixed the zooker configuration and then brought new nodes back into the cluster.
Alok On Thu, Mar 6, 2014 at 11:13 AM, David Koch <[EMAIL PROTECTED]> wrote:
Check if there are an 0 sized wals in /hbase/.logs and sideline them and restart. That could help. As Ted mentioned the actual problematic log names are in the RS logs that got the task assigned. On Fri, Mar 7, 2014 at 12:43 AM, David Koch <[EMAIL PROTECTED]> wrote: Bharath Vissapragada <http://www.cloudera.com>
Glad to know everything is up. We faced this issue too, I'm not really sure whats the exact cause of this. On Mon, Mar 10, 2014 at 4:12 AM, David Koch <[EMAIL PROTECTED]> wrote: Bharath Vissapragada <http://www.cloudera.com>
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext