Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Zookeeper and multiple data centers

Copy link to this message
Zookeeper and multiple data centers
Michael Morello 2012-07-09, 12:01
Hi all,

I work on a project and I would be happy to have your thoughts about our
requirements and how Zookeeper meets them.

The facts :
* We need to share configuration items between 10 data centers.
Configuration must be synchronized between data centers (actually we can
tolerate a few seconds of inconsistency)
* Configuration items will be serialized in JSon and together they can fit
into 256MB of heap
* R/W ratio is 90% read and 10% write and client number should be low (50
to 100 in each data center)
* A client running in a DC can freely communicate with a host in an other DC
* Latency between data center is 20 to 60 ms
* Only 1 host (machine) per data center might be dedicated to a Zookeeper
process : machines are big IBM AIX boxes only one is dedicated for this
project in each DC
* Project must survive a data center crash

Since configuration items are small and they must be synchronized and we
need a fail-over mechanism Zookeeper appears to be a good candidate, but
i'm not sure how to deploy it mainly because we have to start only one
Zookeeper process in each data center.
My idea is to deploy 1 follower in only 5 DC. This way there are 5
followers all over the country and we can lost 2 DC). Of course all the
clients on all the data centers must know where are the 5 zookeeper servers.
Do you see any downside to do this ?

I know that Zookeeper has been designed to run on a LAN and on "commodity
hardware" but regarding the R/W ratio and the latency do you think that it
is a good idea to deploy it this way ?

Thanks for your comments

Best regards,
Jean-Pierre Koenig 2012-07-09, 12:14
Michael Morello 2012-07-09, 14:29