On Fri, Nov 9, 2012 at 5:13 AM, Jan Van Besien <[EMAIL PROTECTED]> wrote:
> I am trying to understand in detail how HBase replication works.
> First of all, I assume that it is required for replication to work correct
> that all edits are replayed on the replica HBase cluster in the same order
> as they were executed on the source HBase cluster. Correct?
Not really.... as it just tails the WAL at each regionserver in the
source cluster and sends the edits to the destination cluster. The
keyvalues are not altered while sending to the slave, so even if they
reach out of order, they settle down there with the same effect. They
are in order from a source cluster regionserver's perspective, not
from a Table instance perspective.
You can refer to blog posts about replication working.
> If so, I am trying to understand how that is guara
> I can see that this is trivially true by reading the edits in the HLog, and
> using that as a source for replication.
> However, what if a region is moved to another region server. Can we not end
> up in the following sitation?
> 1) region A is originally hosted by region server X.
> 2) replication in region server X is replicating edits of region A. Say that
> it is lagging behind a bit, so it has a number of edits still to do.
> 3) region A is moved to region server Y.
> 4) edits for region A arrive on region server Y, and replication on region
> server Y starts replicating them
> 5) replication in region server X is still busy with some left over edits
> from region A, so these are replicated out of order
> So the question really is whether there is a mechanism to prevent the
> replication source from reading edits in a HLog for a region that was
> meanwhile already moved to another region server.
> It could be that it has something to do with log splitting and recovery, but
> I was under the assumption that HBase only splits logs in case of recovery
> and/or master restart, and not in case of region moves.
> I hope somebody can shed some light on this topic.