Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - question about data replication


Copy link to this message
-
Re: question about data replication
Brian Tarbox 2012-11-06, 16:56
Ted,
Thanks. Is it an actual value I set in zoo.cfg or is it just implied by the
number of nodes in my cluster?
Sorry for being dense :-)

Brian
On Tue, Nov 6, 2012 at 11:37 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> You specify the MINIMUM number of copies when you define the number of
> nodes in your ZK cluster.
>
> The idea is that ZK requires strong consistency and provides guarantees to
> that effect.  The only way to provide those guarantees is if a majority of
> the ZK cluster agree to and persist all changes.  That is in strong
> contrast to Cassandra which tries to provide availability instead of
> consistency.
>
> Since ZK requires a majority for every commit, a cluster defined with N
> nodes will require ceiling((N+1)/2) nodes to commit every change.
>  Likewise, N is not flexible without some care to make sure that these
> guarantees are maintained.
>
> On Tue, Nov 6, 2012 at 6:50 AM, Brian Tarbox <[EMAIL PROTECTED]>
> wrote:
>
> > I'm working with both Cassandra and Zookeeper so please excuse me if this
> > is a dumb question but does Zookeeper allow/require me to specify the
> > number of copies of data (like Cassandra does) or is it simply the case
> > that if a majority of nodes are up then ALL of my data is available?
> >
> > Thanks.  I'm guessing this should be obvious to me but searching the
> > various docs didn't yield a clear answer.
> >
> > Thanks again.
> >
> > Brian Tarbox
> >
> > --
> > http://about.me/BrianTarbox
> >
>

--
http://about.me/BrianTarbox