Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Replication questions

Copy link to this message
HBase Replication questions

We are looking into HBase replication to separate our clients'-facing HBase
cluster and the one we need to run analytics against (likely heavy MR jobs +
potentially big scans).

1. How long does it take for edits to be propagated to a slave cluster?

As far as I understand from HBase Replication page
(http://hbase.apache.org/replication.html) there's a separate buffer held by
each region server which accumulates data (edits which should be replicated from
the edit log) before sending to Slave cluster's RSs. So basically data are sent
to slave cluster when:
* buffer is full. Is there an option to configure its size (as a way to affect
flushing frequency)?
* the end of edit log is reached by this "working thread". Does thread process
the edit log periodically or is it watching for edit log to change and acts
"immediately"? If the former, what is the default interval between runs? Can it
be configured?

2. How reliable is replication?

It looks like when there are some networking issues and slave cluster can't be
reached, this is handled gracefully by replication mechanism. It sounds like
this should also cover slave cluster going down for some reason. Are there any
possible scenarios when replication can be broken?

3. Replication of existing (and possibly big) cluster after the fact.

What are the options to replicate all existing data to a new (& empty) slave
cluster if replication wasn't configured from the start and keep replicating
from that point?  It seems that because edit logs on the master cluster get
cleaned this might not be possible?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/