-Re: Zookeeper Implementation
Eric Newton 2013-07-16, 13:39
On Tue, Jul 16, 2013 at 9:23 AM, Drew Thornton
> Thank you, but that is not the situation.
> If one zookeeper node is shutdown/fails/whatever and the rest of the
> ensemble stays up, the tablet servers attached as clients to the shutdown
> node immediately fail. If one of the clients happens to be the master, the
> cluster goes down.
> Accumulo does not seem to be failing over to the remaining zookeeper
> nodes, and this causes me to restart the individual tablet servers again.
> The zookeeper ensemble is very stable and has plenty of
> bandwidth/memory/processing, so taking one node down out of five doesn't
> crash the zookeepers, just the tablet servers...
> Drew Thornton
> Data Tactics Corporation
> [EMAIL PROTECTED]
> 571.297.2173 (w)
> 804.615.0771 (m)
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Sent: Monday, July 15, 2013 3:56 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Zookeeper Implementation
> I have seen this behavior (with Accumulo 1.4.4 though) when one of
> Zookeeper nodes being restarted, then, after few seconds delay, another
> node being restarted.
> I did not investigate the issue, but it seems that if you want to change
> Zookeeper configuration and restart all nodes, you have to wait few minutes
> between restarts.
> On 7/15/13, Drew Thornton <[EMAIL PROTECTED]> wrote:
> > Yes, [ maxClientCnxns=100 ]. I've used full hostnames and ports as
> > well in Accumulo-site.
> > I noticed the pattern of crashes when I was testing Zookeeper's JVM
> > garbage collector settings. I would take one node out at a time to
> > restart its JVM, and individual Tablet Servers (and eventually the
> > master) would crash depending on the Zookeeper node that I took down.
> > Drew
> > From: Eric Newton [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, July 15, 2013 2:31 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Zookeeper Implementation
> > You are giving the names of all the zookeeper nodes in
> > accumulo-site.xml, right?
> > <property>
> > <name>instance.zookeeper.host</name>
> > <value>zoo1,zoo2,zoo3,zoo4,zoo5</value>
> > </property>
> > Have you increased maxClientCnxns as described in the accumulo README?
> > -Eric
> > On Mon, Jul 15, 2013 at 2:04 PM, Drew Thornton
> > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> > Hello,
> > I'm running a small cluster of 10 tablet servers and 5 zookeeper nodes
> > (CDH 4.3, Zookeeper 3.4.5, Accumulo 1.5.0).
> > I have noticed that when a zookeeper node dies, the connected tablet
> > server clients also die instead of failing-over to another zookeeper.
> > If the clients on the failed zookeeper are only tablet servers,
> > Accumulo reassigns the tablets. If the Accumulo Master is one of the
> > clients on the failed node, then the master goes down and the cluster
> with it.
> > Anyone else have this problem or know of a workaround/solution to keep
> > the cluster up when zookeeper changes state?
> > Thanks,
> > Drew