Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [Shadow Regions / Read Replicas ] Block Affinity

Copy link to this message
Re: [Shadow Regions / Read Replicas ] Block Affinity
On Tue, Dec 3, 2013 at 3:46 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> On Tue, Dec 3, 2013 at 11:37 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
> > I think we do not want to differentiate between RS's by splitting them
> between
> > primaries and shadows. This will complicate provisioning, administration,
> > monitoring and load balancing a lot, and will not achieve very cheap
> > secondary region promotions (because you have to move the region still as
> > you described).
> >
> The idea of having "primary hosts" and "replica hosts" was brought up in
> initial design discussions over here. I am particularly against this
> approach because of the additional complexity. I need to update myself on
> Enis's doc (I'm a week+ behind), but my opinion is that we treat a
> non-primary region (be it a "read replica" or a "shadow region") as a
> first-class and independent entities. These entities can be assigned to any
> host in the cluster, each with their own individual state machine
> instances.
> Of course, the balancer would need to be aware of the relationship between
> the primary and its non-primaries in order to maintain the balancing policy
> requirements. However, I see no reason for there to be specialization at
> the host level, and I agree with Enis's arguments against it.
> -n

I think there was a misunderstanding here -- I made a distinction between
the "normal" primary regions, eventually-consistent-read-replica/secondary
regions, and shadow memstore regions (for fast consistent read recovery).
 All region servers would be able to host normal primary regions,
read-replica regions and shadow memstore regions.

There would be different potential sweet spots if read-replica regions and
shadow memstore regions were  co-located at region on recover time with
trade offs for fast consistent recovery, ability to have more recent
values, locality optimizations and load balancing optimizations.


// Jonathan Hsieh (shay)
// Software Engineer, Cloudera