I am trying to understand in detail how HBase replication works.
First of all, I assume that it is required for replication to work
correct that all edits are replayed on the replica HBase cluster in the
same order as they were executed on the source HBase cluster. Correct?
If so, I am trying to understand how that is guaranteed.
I can see that this is trivially true by reading the edits in the HLog,
and using that as a source for replication.
However, what if a region is moved to another region server. Can we not
end up in the following sitation?
1) region A is originally hosted by region server X.
2) replication in region server X is replicating edits of region A. Say
that it is lagging behind a bit, so it has a number of edits still to do.
3) region A is moved to region server Y.
4) edits for region A arrive on region server Y, and replication on
region server Y starts replicating them
5) replication in region server X is still busy with some left over
edits from region A, so these are replicated out of order
So the question really is whether there is a mechanism to prevent the
replication source from reading edits in a HLog for a region that was
meanwhile already moved to another region server.
It could be that it has something to do with log splitting and recovery,
but I was under the assumption that HBase only splits logs in case of
recovery and/or master restart, and not in case of region moves.
I hope somebody can shed some light on this topic.