Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Never ending distributed log split


Copy link to this message
-
Re: Never ending distributed log split
Can you search for 1d44b0630ed7785106a87a2bd4993551/recovered.edits to see
when it was created ?
Namenode log would be a good place to start with.

bq. we can also rename it so if really required we can replay it later?

The above is a better way of handling the situation.

What version of HBase are you using ?

Cheers

On Sun, Jun 2, 2013 at 8:09 AM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
> wrote:

> My HBase was in a bad state recently. HBCK did a slow but good job and
> everything is now almost stable. However, I still have one log split
> which is not working. Every minute, the SplitLogManager try to split
> the log, fails, and retry. It's always the same file. It's assigned to
> different nodes, but all failed, and it's starting again and again.
>
>
> 2013-06-02 10:44:20,298 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of
> logs to split
> 2013-06-02 10:44:20,298 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs
> in [hdfs://node3:9000/hbase/.logs/node7,60020,1370118961527-splitting]
> 2013-06-02 10:44:20,298 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager: wait for status of
> task
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370122562614
> to change to DELETED
> 2013-06-02 10:44:20,315 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback:
> deleted
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370122562614
> 2013-06-02 10:44:20,329 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task
> at znode
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370122562614
> 2013-06-02 10:44:20,341 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task
> at znode
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370129764666
> 2013-06-02 10:44:20,344 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370122562614
> ver = 0
> 2013-06-02 10:44:20,346 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370129764666
> ver = 0
> 2013-06-02 10:44:20,384 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: task
>
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370122562614
> acquired by node1,60020,1370136472290
> 2013-06-02 10:44:20,410 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: task
>
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode7%2C60020%2C1370118961527-splitting%2Fnode7%252C60020%252C1370118961527.1370129764666
> acquired by node4,60020,1370136467255
> 2013-06-02 10:44:20,497 TRACE
> org.apache.hadoop.hbase.master.SplitLogManager: Skipping the resubmit
> of last_update = 1370184260384 last_version = 1 cur_worker_name > node1,60020,1370136472290 status = in_progress incarnation = 0
> resubmits = 0 batch = installed = 2 done = 0 error = 0  because the
> server node1,60020,1370136472290 is not marked as dead, we waited for
> 113 while the timeout is 300000
> 2013-06-02 10:44:20,497 TRACE
> org.apache.hadoop.hbase.master.SplitLogManager: Skipping the resubmit
> of last_update = 1370184260410 last_version = 1 cur_worker_name > node4,60020,1370136467255 status = in_progress incarnation = 0
> resubmits = 0 batch = installed = 2 done = 0 error = 0  because the
> server node4,60020,1370136467255 is not marked as dead, we waited for
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB