|
|
-
Question about sharing Zookeeper connections
YUNG-LIN HO 2011-02-18, 09:24
Hi All,
My project uses few zookeeper libraries, each of them manages one(or more) zookeeper client by itself. - my own code uses a zookeeper client in scala which supports connection pool. - cages, which is an easy-to-use distributed lock interface implemented in zookeeper. All locks created by cage share the same connection. - camel-zookeeper, an opensource ESB-like program. Each component instance will create a new connection to zookeeper server.
Because zookeeper clients will try to keep session alive by sending a ping request every 2 seconds. If libraries in an application do not share connections with each other, they would flood the zookeeper server with unnecessary requests and drag down performance of the server.
I am wondering is there any connection manager exists in the Hadoop/Zookeeper project that helps users to share connections?
best regards, -yunglin
-
Re: Question about sharing Zookeeper connections
Ted Dunning 2011-02-18, 17:31
On Fri, Feb 18, 2011 at 1:24 AM, YUNG-LIN HO <[EMAIL PROTECTED]> wrote:
> Because zookeeper clients will try to keep session alive by sending a ping > request every 2 seconds. If libraries in an application do not share > connections with each other, they would flood the zookeeper server with > unnecessary requests and drag down performance of the server. >
Make sure that you have a valid reason to worry first.
Do you have thousands of clients?
If not, these keep-alives are likely to be undetectable, load-wise. > I am wondering is there any connection manager exists in the > Hadoop/Zookeeper project that helps users to share connections? >
Yes. Zookeeper.
Just open a single connection and pass it around via a singleton of some kind or your favorite dependency injection technique.
This isn't always a great idea since your disconnect and expiration strategies might differ between different uses in important ways.
-
Re: Question about sharing Zookeeper connections
Benjamin Reed 2011-02-18, 17:58
it does seem like a good idea to make multiple zk handles share a connection, but as ted points out, they may have different timeouts, which would make the sharing logic quite complicated. i think the implementation might also be quite complicated. having said that, if someone could come up with a simple and correct connection sharing implementation, we (or at least i) would be open to it.
ben
On Fri, Feb 18, 2011 at 9:31 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> On Fri, Feb 18, 2011 at 1:24 AM, YUNG-LIN HO <[EMAIL PROTECTED]> wrote: > > > Because zookeeper clients will try to keep session alive by sending a > ping > > request every 2 seconds. If libraries in an application do not share > > connections with each other, they would flood the zookeeper server with > > unnecessary requests and drag down performance of the server. > > > > Make sure that you have a valid reason to worry first. > > Do you have thousands of clients? > > If not, these keep-alives are likely to be undetectable, load-wise. > > > > I am wondering is there any connection manager exists in the > > Hadoop/Zookeeper project that helps users to share connections? > > > > Yes. Zookeeper. > > Just open a single connection and pass it around via a singleton of some > kind or your favorite dependency injection technique. > > This isn't always a great idea since your disconnect and expiration > strategies might differ between different uses in important ways. >
-
Re: Question about sharing Zookeeper connections
Peco Karayanev 2011-02-18, 19:13
Hi, I slightly disagree with the priority for a "connection pool" for zookeeper. I had to implement a connection pool/reuse for a smaller environment (under 50 nodes). I have a toolchain that can be composed of parts, and each part needed some synchronization through zookeeper. So even on a smaller system these heavier toolchains started using a lot of physical sessions and connections. Hence the connection pool. Also from performance perspective establishing a TCP connection for every session adds latency overhead.
Cheers Peco On Fri, Feb 18, 2011 at 11:58 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote:
> it does seem like a good idea to make multiple zk handles share a > connection, but as ted points out, they may have different timeouts, which > would make the sharing logic quite complicated. i think the implementation > might also be quite complicated. having said that, if someone could come up > with a simple and correct connection sharing implementation, we (or at > least > i) would be open to it. > > ben > > On Fri, Feb 18, 2011 at 9:31 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > On Fri, Feb 18, 2011 at 1:24 AM, YUNG-LIN HO <[EMAIL PROTECTED]> wrote: > > > > > Because zookeeper clients will try to keep session alive by sending a > > ping > > > request every 2 seconds. If libraries in an application do not share > > > connections with each other, they would flood the zookeeper server with > > > unnecessary requests and drag down performance of the server. > > > > > > > Make sure that you have a valid reason to worry first. > > > > Do you have thousands of clients? > > > > If not, these keep-alives are likely to be undetectable, load-wise. > > > > > > > I am wondering is there any connection manager exists in the > > > Hadoop/Zookeeper project that helps users to share connections? > > > > > > > Yes. Zookeeper. > > > > Just open a single connection and pass it around via a singleton of some > > kind or your favorite dependency injection technique. > > > > This isn't always a great idea since your disconnect and expiration > > strategies might differ between different uses in important ways. > > >
-
Re: Question about sharing Zookeeper connections
Ted Dunning 2011-02-18, 19:40
Most applications just share a single connection for all uses of a single class. That might be viewed at the logically simplest extreme of a simple connection pool.
Why did you need more than one connection for each kind of use?
On Fri, Feb 18, 2011 at 11:13 AM, Peco Karayanev <[EMAIL PROTECTED]>wrote:
> Hi, > I slightly disagree with the priority for a "connection pool" for > zookeeper. > I had to implement a connection pool/reuse for a smaller environment (under > 50 nodes). I have a toolchain that can be composed of parts, and each part > needed some synchronization through zookeeper. So even on a smaller system > these heavier toolchains started using a lot of physical sessions and > connections. Hence the connection pool. Also from performance perspective > establishing a TCP connection for every session adds latency overhead. > > Cheers > Peco > > > On Fri, Feb 18, 2011 at 11:58 AM, Benjamin Reed <[EMAIL PROTECTED]> > wrote: > > > it does seem like a good idea to make multiple zk handles share a > > connection, but as ted points out, they may have different timeouts, > which > > would make the sharing logic quite complicated. i think the > implementation > > might also be quite complicated. having said that, if someone could come > up > > with a simple and correct connection sharing implementation, we (or at > > least > > i) would be open to it. > > > > ben > > > > On Fri, Feb 18, 2011 at 9:31 AM, Ted Dunning <[EMAIL PROTECTED]> > > wrote: > > > > > On Fri, Feb 18, 2011 at 1:24 AM, YUNG-LIN HO <[EMAIL PROTECTED]> > wrote: > > > > > > > Because zookeeper clients will try to keep session alive by sending a > > > ping > > > > request every 2 seconds. If libraries in an application do not share > > > > connections with each other, they would flood the zookeeper server > with > > > > unnecessary requests and drag down performance of the server. > > > > > > > > > > Make sure that you have a valid reason to worry first. > > > > > > Do you have thousands of clients? > > > > > > If not, these keep-alives are likely to be undetectable, load-wise. > > > > > > > > > > I am wondering is there any connection manager exists in the > > > > Hadoop/Zookeeper project that helps users to share connections? > > > > > > > > > > Yes. Zookeeper. > > > > > > Just open a single connection and pass it around via a singleton of > some > > > kind or your favorite dependency injection technique. > > > > > > This isn't always a great idea since your disconnect and expiration > > > strategies might differ between different uses in important ways. > > > > > >
-
Re: Question about sharing Zookeeper connections
Thomas Koch 2011-02-19, 15:18
Benjamin Reed: > it does seem like a good idea to make multiple zk handles share a > connection, but as ted points out, they may have different timeouts, which > would make the sharing logic quite complicated. i think the implementation > might also be quite complicated. having said that, if someone could come up > with a simple and correct connection sharing implementation, we (or at > least i) would be open to it. Hi, I agree with Benjamin, that it is unlikely that multiple ZooKeeper connections in one application will ever become a bottleneck. However I have some patches in the ZK jira (most notable ZOOKEEPER-911) and in my mind that have the side effect that one ClientCnxn could easily be shared for multiple application parts, even with differing changeroots. Best regards, Thomas Koch, http://www.koch.ro
-
Re: Question about sharing Zookeeper connections
Ishaaq Chandy 2011-03-10, 21:42
Sorry to resurrect an old discussion, but this is pertinent to me.
So, just to clarify, what you're saying is - if you've got a subsystem in an app that uses ZK then you should be fine (i.e both from a thread-safety perspective as well as a performance perspective) sharing a single connection between all the threads running code in that subsystem (this is what I interpreted your phrase "kind of use" to mean) - even if, say, you expect thousands of calls to happen on that one connection over the span of, say, a couple of minutes?
On the other hand are you also saying that it is advisable to have separate ZK connections for each subystem that requires it? Even if they are all running in the same JVM?
Ishaaq
On 19 February 2011 06:40, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Most applications just share a single connection for all uses of a single > class. That might be viewed at the logically simplest extreme of a simple > connection pool. > > Why did you need more than one connection for each kind of use? > > On Fri, Feb 18, 2011 at 11:13 AM, Peco Karayanev <[EMAIL PROTECTED] > >wrote: > > > Hi, > > I slightly disagree with the priority for a "connection pool" for > > zookeeper. > > I had to implement a connection pool/reuse for a smaller environment > (under > > 50 nodes). I have a toolchain that can be composed of parts, and each > part > > needed some synchronization through zookeeper. So even on a smaller > system > > these heavier toolchains started using a lot of physical sessions and > > connections. Hence the connection pool. Also from performance perspective > > establishing a TCP connection for every session adds latency overhead. > > > > Cheers > > Peco > > > > > > On Fri, Feb 18, 2011 at 11:58 AM, Benjamin Reed <[EMAIL PROTECTED]> > > wrote: > > > > > it does seem like a good idea to make multiple zk handles share a > > > connection, but as ted points out, they may have different timeouts, > > which > > > would make the sharing logic quite complicated. i think the > > implementation > > > might also be quite complicated. having said that, if someone could > come > > up > > > with a simple and correct connection sharing implementation, we (or at > > > least > > > i) would be open to it. > > > > > > ben > > > > > > On Fri, Feb 18, 2011 at 9:31 AM, Ted Dunning <[EMAIL PROTECTED]> > > > wrote: > > > > > > > On Fri, Feb 18, 2011 at 1:24 AM, YUNG-LIN HO <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > Because zookeeper clients will try to keep session alive by sending > a > > > > ping > > > > > request every 2 seconds. If libraries in an application do not > share > > > > > connections with each other, they would flood the zookeeper server > > with > > > > > unnecessary requests and drag down performance of the server. > > > > > > > > > > > > > Make sure that you have a valid reason to worry first. > > > > > > > > Do you have thousands of clients? > > > > > > > > If not, these keep-alives are likely to be undetectable, load-wise. > > > > > > > > > > > > > I am wondering is there any connection manager exists in the > > > > > Hadoop/Zookeeper project that helps users to share connections? > > > > > > > > > > > > > Yes. Zookeeper. > > > > > > > > Just open a single connection and pass it around via a singleton of > > some > > > > kind or your favorite dependency injection technique. > > > > > > > > This isn't always a great idea since your disconnect and expiration > > > > strategies might differ between different uses in important ways. > > > > > > > > > >
|
|