Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Setting up NxN replication


Copy link to this message
-
Re: Setting up NxN replication
Ted, good point on the 11th location. Thanks

One thing I didn't mention (clearly) is about the limitation of 'global'
replication. Imaging all the 10 clusters are setup well for the 1st
table:column family. Then 6 months later, the 2nd table:column enters the
picture. How to limit the replication of 2nd cf to fewer clusters(let's say
3 of the 10)? least two good reasons for such use case:
1) 2nd cf is less important (or used by fewer workloads) so no need to
waster network/storage to keep all 10 copies(and multiply by the # of
replica within the same cluster)
2) 2nd cf is so important that legal/business requires the data to be kept
within U.S, but one of the 10 cluster is in Japan.

If we simply don't create target table on 7 of the 10 clusters, the queue
will growth to hit the max capacity very quickly. I am not sure about the
consequence of such situation, but my guess it won't be pretty.

Basically, once the business requirement evolves a bit, complexity gets
larger.

Demai
On Fri, Nov 8, 2013 at 3:47 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> bq. how about your company have a new office in the 11th locations?
>
> With minimum spanning tree approach, the increase in load wouldn't be
> exponential.
>
>
> On Fri, Nov 8, 2013 at 2:58 PM, Demai Ni <[EMAIL PROTECTED]> wrote:
>
> > Ishan,
> >
> > have to admit that I am a bit surprise about the need of have data center
> > in 10 different locations. Well, I guess I shouldn't be, as every company
> > is global now(anyone from Mars yet?)
> >
> > In your case, since there is only one column family. The headache is not
> as
> > bad. Let's call your clusters as C1, C2, ... C10
> >
> > The safest way for your most critical data is still have setup the M-M
> > replication by 1 to N-1. That is every cluster add the rest of clusters
> as
> > its peer. For example C1 will have C2, C3...C10 as its peers; C2 will
> have
> > C1, C3.. C10. Well, that will be a lot of data over the network. Although
> > it is the best/fast way to get all the cluster sync-up. I don't like the
> > idea at all(too expensive for one).
> >
> > Now, let's improve it a bit. C1 will setup M-M to 2 of the rest 9, and
> > carefully planned the distribution so that all the clusters will get
> equal
> > load. Well, a system administrator has to do it manually.
> >
> > Now, thinking about the headache:
> > 1) what if your company(that is your manager who has no idea how
> difficult
> > it is) decide to have one more column family to be replicated?  how about
> > two more? The load will grow exponentially
> > 2) how about your company have a new office in the 11th locations? again,
> > grow exponentially
> > 3) let's say you are the best administrator, and keep nice record of
> > everything (unforturnatly, Hbase alone doesn't have a good way to
> maintain
> > all the record of who is being replicated). And then, the admin left the
> > company? or this is a global company has 10 admin at different locations.
> > How do they communicate of the replication setup?
> >
> > :-) Well, the 3) is not too bad. I just like to point it out as it can be
> > quite true for a company large enough to have 10 locations
> >
> > Demai
> >
> >
> >
> >
> > On Fri, Nov 8, 2013 at 2:42 PM, Ishan Chhabra <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Ted:
> > > Yes. It is the same table that is being written to from all locations.
> A
> > > single row could be updated from multiple locations, but our schema is
> > > designed in a manner that writes will be independent and not clobber
> each
> > > other.
> > >
> > >
> > > On Fri, Nov 8, 2013 at 2:33 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > Ishan:
> > > > In your use case, the same table is written to in 10 clusters at
> > roughly
> > > > the same time ?
> > > >
> > > > Please clarify.
> > > >
> > > >
> > > > On Fri, Nov 8, 2013 at 2:29 PM, Ishan Chhabra <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > @Demai,
> > > > > We actually have 10 clusters in different locations.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB