Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Replication sink selection strategy


Copy link to this message
-
Replication sink selection strategy
Hi,

I was wondering if someone (perhaps Jean-Daniel, but anyone is welcome) could explain the reasoning for the current peer sink selection logic within replication.

As it currently stands, a percentage (by default 10%) of the slave cluster's region servers are randomly chosen by each region server in the master cluster as their replication pool. Each time a batch of edits is shipped to a peer, one region server is chosen from the pre-selected pool of slave region servers.

I was wondering what the advantage(s) of this approach are compared to each master region server simply randomly choosing a slave peer from the full set of slave region servers. In my (probably naive) view, this approach would provide a more even distribution of usage over the whole slave cluster, and I can't see any real advantages that the current approach has (although I assume there must be some).

Could someone let me know what the reasoning is behind the current approach?

Thanks,

Gabriel
+
Jean-Daniel Cryans 2013-02-12, 21:14
+
Gabriel Reid 2013-02-13, 12:31
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB