Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Zookeeper and multiple data centers


+
Michael Morello 2012-07-09, 12:01
+
Jean-Pierre Koenig 2012-07-09, 12:14
Copy link to this message
-
Re: Zookeeper and multiple data centers
Michael Morello 2012-07-09, 14:29
Hi Jean-Pierre,

Thank you for your update, here are some additional details :
The configuration will be distributed on a multitude of znodes, when i talk
about 256MB it is because we plan to have hundreds of them. We will keep
JSon data as small as possible.
Regarding the connectivity we use a private network and according to the
SLA a network availability of 99.99% is expected between data centers.
Knowing this do you still think that we will run
into SessionExpired and ConnectionLoss issues ?

Best regards,
Michael

2012/7/9 Jean-Pierre Koenig <[EMAIL PROTECTED]>

> [...]
> But you should beware of large payload. ZK is not designed to handle
> huge amount of data [....] I highly recommend not more
> than 1024 KB payload.  The other point you should consider here is
> (network) latency. i guess your ZK clients (your 50) will see a lot of
> SessionExpired or ConnectionLoss exceptions, depending on the
> connectivity of your DC's among one another.
>
> On Mon, Jul 9, 2012 at 2:01 PM, Michael Morello
> <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > I work on a project and I would be happy to have your thoughts about our
> > requirements and how Zookeeper meets them.
> >
> > The facts :
> > * We need to share configuration items between 10 data centers.
> > Configuration must be synchronized between data centers (actually we can
> > tolerate a few seconds of inconsistency)
> > * Configuration items will be serialized in JSon and together they can
> fit
> > into 256MB of heap
> > * R/W ratio is 90% read and 10% write and client number should be low (50
> > to 100 in each data center)
> > * A client running in a DC can freely communicate with a host in an
> other DC
> > * Latency between data center is 20 to 60 ms
> > * Only 1 host (machine) per data center might be dedicated to a Zookeeper
> > process : machines are big IBM AIX boxes only one is dedicated for this
> > project in each DC
> > * Project must survive a data center crash
> >
> > Since configuration items are small and they must be synchronized and we
> > need a fail-over mechanism Zookeeper appears to be a good candidate, but
> > i'm not sure how to deploy it mainly because we have to start only one
> > Zookeeper process in each data center.
> > My idea is to deploy 1 follower in only 5 DC. This way there are 5
> > followers all over the country and we can lost 2 DC). Of course all the
> > clients on all the data centers must know where are the 5 zookeeper
> servers.
> > Do you see any downside to do this ?
> >
> > I know that Zookeeper has been designed to run on a LAN and on "commodity
> > hardware" but regarding the R/W ratio and the latency do you think that
> it
> > is a good idea to deploy it this way ?
>