Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - When replication is stopped, .oldlogs is never cleaned


+
Dave Latham 2013-02-26, 20:25
Copy link to this message
-
Re: When replication is stopped, .oldlogs is never cleaned
Jean-Daniel Cryans 2013-02-26, 20:53
The stop_replication command is really a way to kill it, not a way to
stop it. My bad for naming it like that. It should only be used if
you're having problems and need to stop all replication activities
from happening. It is dirty by nature.

It won't clean the logs since you may want to restart replication
after killing it. One could make the point that since killing
replication is dirty you don't need keep the logs around which would
be fair. But to me you should never have to stay on stop_replication
more than a few minutes, either you'll continue replicating, you drop
the peer, or you disable that peer.

FWIW setting hbase.replication to true with no peers should achieve
what you want, no need to call stop_replication.

J-D

On Tue, Feb 26, 2013 at 3:25 PM, Dave Latham <[EMAIL PROTECTED]> wrote:
> We have been preparing to enable replication between two large clusters.
> For the past couple of weeks, replication has been enabled via
> hbase-site.xml, but the replication state has been false (set false by
> issuing a stop_replication command).
>
> The master is no longer cleaning any logs from /hbase/.oldlogs  It reached
> 2MM+ logs using 140TB of data before we noticed that the hbase master heap
> was growing (about 2GB in use by the LogCleaner form the FileStatus objects
> of this directory).  Looking at ReplicationLogCleaner the first check it
> makes is that if replication is stopped, then it prevents all logs from
> being cleaned which can lead to the master going OOM or HDFS running out of
> space.  I would have expected once replication is stopped that it would
> allow logs to be cleaned and expired.
> Looking through JIRAs, I suspect this is the cause of
> https://issues.apache.org/jira/browse/HBASE-3489
>
> I believe our fix will be to start_replication with no peers enabled, but I
> think the ReplicationLogCleaner should be changed.  Anyone else care to
> weigh in with an opinion?  (JD?)
>
> There's also some discussion about the "kill switch" that may be relevant
> here:
> https://issues.apache.org/jira/browse/HBASE-5222
>
> Dave
+
Dave Latham 2013-02-26, 21:03
+
Jean-Daniel Cryans 2013-02-27, 00:23
+
Dave Latham 2013-02-27, 14:45