Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Setting up NxN replication


Copy link to this message
-
Re: Setting up NxN replication
Ted, good point on the 11th location. Thanks

One thing I didn't mention (clearly) is about the limitation of 'global'
replication. Imaging all the 10 clusters are setup well for the 1st
table:column family. Then 6 months later, the 2nd table:column enters the
picture. How to limit the replication of 2nd cf to fewer clusters(let's say
3 of the 10)? least two good reasons for such use case:
1) 2nd cf is less important (or used by fewer workloads) so no need to
waster network/storage to keep all 10 copies(and multiply by the # of
replica within the same cluster)
2) 2nd cf is so important that legal/business requires the data to be kept
within U.S, but one of the 10 cluster is in Japan.

If we simply don't create target table on 7 of the 10 clusters, the queue
will growth to hit the max capacity very quickly. I am not sure about the
consequence of such situation, but my guess it won't be pretty.

Basically, once the business requirement evolves a bit, complexity gets
larger.

Demai
On Fri, Nov 8, 2013 at 3:47 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. how about your company have a new office in the 11th locations?
>
> With minimum spanning tree approach, the increase in load wouldn't be
> exponential.
>
>
> On Fri, Nov 8, 2013 at 2:58 PM, Demai Ni <[EMAIL PROTECTED]> wrote:
>
> > Ishan,
> >
> > have to admit that I am a bit surprise about the need of have data center
> > in 10 different locations. Well, I guess I shouldn't be, as every company
> > is global now(anyone from Mars yet?)
> >
> > In your case, since there is only one column family. The headache is not
> as
> > bad. Let's call your clusters as C1, C2, ... C10
> >
> > The safest way for your most critical data is still have setup the M-M
> > replication by 1 to N-1. That is every cluster add the rest of clusters
> as
> > its peer. For example C1 will have C2, C3...C10 as its peers; C2 will
> have
> > C1, C3.. C10. Well, that will be a lot of data over the network. Although
> > it is the best/fast way to get all the cluster sync-up. I don't like the
> > idea at all(too expensive for one).
> >
> > Now, let's improve it a bit. C1 will setup M-M to 2 of the rest 9, and
> > carefully planned the distribution so that all the clusters will get
> equal
> > load. Well, a system administrator has to do it manually.
> >
> > Now, thinking about the headache:
> > 1) what if your company(that is your manager who has no idea how
> difficult
> > it is) decide to have one more column family to be replicated?  how about
> > two more? The load will grow exponentially
> > 2) how about your company have a new office in the 11th locations? again,
> > grow exponentially
> > 3) let's say you are the best administrator, and keep nice record of
> > everything (unforturnatly, Hbase alone doesn't have a good way to
> maintain
> > all the record of who is being replicated). And then, the admin left the
> > company? or this is a global company has 10 admin at different locations.
> > How do they communicate of the replication setup?
> >
> > :-) Well, the 3) is not too bad. I just like to point it out as it can be
> > quite true for a company large enough to have 10 locations
> >
> > Demai
> >
> >
> >
> >
> > On Fri, Nov 8, 2013 at 2:42 PM, Ishan Chhabra <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Ted:
> > > Yes. It is the same table that is being written to from all locations.
> A
> > > single row could be updated from multiple locations, but our schema is
> > > designed in a manner that writes will be independent and not clobber
> each
> > > other.
> > >
> > >
> > > On Fri, Nov 8, 2013 at 2:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > Ishan:
> > > > In your use case, the same table is written to in 10 clusters at
> > roughly
> > > > the same time ?
> > > >
> > > > Please clarify.
> > > >
> > > >
> > > > On Fri, Nov 8, 2013 at 2:29 PM, Ishan Chhabra <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > @Demai,
> > > > > We actually have 10 clusters in different locations.