Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Zookeeper and multiple data centers


+
Michael Morello 2012-07-09, 12:01
+
Jean-Pierre Koenig 2012-07-09, 12:14
Copy link to this message
-
Re: Zookeeper and multiple data centers
Hi Jean-Pierre,

Thank you for your update, here are some additional details :
The configuration will be distributed on a multitude of znodes, when i talk
about 256MB it is because we plan to have hundreds of them. We will keep
JSon data as small as possible.
Regarding the connectivity we use a private network and according to the
SLA a network availability of 99.99% is expected between data centers.
Knowing this do you still think that we will run
into SessionExpired and ConnectionLoss issues ?

Best regards,
Michael

2012/7/9 Jean-Pierre Koenig <[EMAIL PROTECTED]>

> [...]
> But you should beware of large payload. ZK is not designed to handle
> huge amount of data [....] I highly recommend not more
> than 1024 KB payload.  The other point you should consider here is
> (network) latency. i guess your ZK clients (your 50) will see a lot of
> SessionExpired or ConnectionLoss exceptions, depending on the
> connectivity of your DC's among one another.
>
> On Mon, Jul 9, 2012 at 2:01 PM, Michael Morello
> <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > I work on a project and I would be happy to have your thoughts about our
> > requirements and how Zookeeper meets them.
> >
> > The facts :
> > * We need to share configuration items between 10 data centers.
> > Configuration must be synchronized between data centers (actually we can
> > tolerate a few seconds of inconsistency)
> > * Configuration items will be serialized in JSon and together they can
> fit
> > into 256MB of heap
> > * R/W ratio is 90% read and 10% write and client number should be low (50
> > to 100 in each data center)
> > * A client running in a DC can freely communicate with a host in an
> other DC
> > * Latency between data center is 20 to 60 ms
> > * Only 1 host (machine) per data center might be dedicated to a Zookeeper
> > process : machines are big IBM AIX boxes only one is dedicated for this
> > project in each DC
> > * Project must survive a data center crash
> >
> > Since configuration items are small and they must be synchronized and we
> > need a fail-over mechanism Zookeeper appears to be a good candidate, but
> > i'm not sure how to deploy it mainly because we have to start only one
> > Zookeeper process in each data center.
> > My idea is to deploy 1 follower in only 5 DC. This way there are 5
> > followers all over the country and we can lost 2 DC). Of course all the
> > clients on all the data centers must know where are the 5 zookeeper
> servers.
> > Do you see any downside to do this ?
> >
> > I know that Zookeeper has been designed to run on a LAN and on "commodity
> > hardware" but regarding the R/W ratio and the latency do you think that
> it
> > is a good idea to deploy it this way ?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB