|
Gunnar Wagenknecht
2011-05-18, 06:45
Patrick Hunt
2011-05-19, 19:14
Gunnar Wagenknecht
2011-05-20, 06:57
Patrick Hunt
2011-05-20, 17:23
|
-
Timing issue when running embedded?Gunnar Wagenknecht 2011-05-18, 06:45
Hi,
I have an application that uses ZooKeeper. There is an ensemble in production. But in order to simplify development the application will start an embedded ZooKeeper server when started in development mode. We are experiencing a timing issue with ZooKeeper 3.3.3 and I was wondering if this is allowed to be happen or if we did something wrong when starting the embedded server. Basically, we have a watch registered using an #exists call and watch code like the following. @Override public void process(final WatchedEvent event) { switch (event.getType()) { ... case NodeCreated: pathCreated(event.getPath()); break; ... } } @Override protected void pathCreated(final String path) { // process events only for this node if (!isMyPath(path)) return; try { loadNode(); // calls zk.getData(String, Watcher, Stat) } catch (final Exception e) { // got NoNodeException here (but not when debugging) log(..., e) } } >From inspecting the logs we noticed a NoNodeException. When setting breakpoints on #loadNode and stepping through we don't get the exception. But when setting a breakpoint on #log only we got a hit and could confirm the issue this way. The path is actually some levels deep. All the parent paths don't exist either so they are created as well. However, no exception is thrown fro them. The sequence is as follows. /l1 --> watch triggered, getData, no exception /l1/l2 --> watch triggered, getData, no exception /l1/l2/l3 --> watch triggered, getData, no exception /l1/l2/l3/l4 --> watch triggered, getData, no exception /l1/l2/l3/l4/l5 --> watch triggered, getData, no exception /l1/l2/l3/l4/l5/l6 --> watch triggered, getData, NoNodeException The only difference is that all paths up to including l5 do not actually have any data. Only l6 has some data. Could there be some latency issues? For completeness, the embedded server is started as follows. // disable LOG4J JMX stuff System.setProperty("zookeeper.jmx.log4j.disable", Boolean.TRUE.toString()); // get directories final File dataDir = new File(config.getDataLogDir()); final File snapDir = new File(config.getDataDir()); // clean old logs PurgeTxnLog.purge(dataDir, snapDir, 3); // create standalone server zkServer = new ZooKeeperServer(); zkServer.setTxnLogFactory(new FileTxnSnapLog(dataDir, snapDir)); zkServer.setTickTime(config.getTickTime()); zkServer.setMinSessionTimeout(config.getMinSessionTimeout()); zkServer.setMaxSessionTimeout(config.getMaxSessionTimeout()); factory = new NIOServerCnxn.Factory(config.getClientPortAddress(), config.getMaxClientCnxns()); // start server LOG.info("Starting ZooKeeper standalone server."); try { factory.startup(zkServer); } catch (final InterruptedException e) { LOG.warn("Interrupted during server start.", e); Thread.currentThread().interrupt(); } -Gunnar -- Gunnar Wagenknecht [EMAIL PROTECTED] http://wagenknecht.org/
-
Re: Timing issue when running embedded?Patrick Hunt 2011-05-19, 19:14
Hi Gunnar this is great detective work. It certainly sounds like it
might be some timing issue or possible bug in ZK exposed by this embedded case. A few questions: 1) in this dev/embedded case you only have a single zk server, correct? 2) you have 2 clients in this case, one creating the znode and one watching (vs say a single client doing both) It would be great if you were able to construct a test case that reproduces this, is that something you could provide? I'd also suggest creating a JIRA to track this issue: https://issues.apache.org/jira/browse/ZOOKEEPER Patrick On Tue, May 17, 2011 at 11:45 PM, Gunnar Wagenknecht <[EMAIL PROTECTED]> wrote: > Hi, > > I have an application that uses ZooKeeper. There is an ensemble in > production. But in order to simplify development the application will > start an embedded ZooKeeper server when started in development mode. We > are experiencing a timing issue with ZooKeeper 3.3.3 and I was wondering > if this is allowed to be happen or if we did something wrong when > starting the embedded server. > > > Basically, we have a watch registered using an #exists call and watch > code like the following. > > @Override > public void process(final WatchedEvent event) { > switch (event.getType()) { > ... > case NodeCreated: > pathCreated(event.getPath()); > break; > ... > } > } > > @Override > protected void pathCreated(final String path) { > // process events only for this node > if (!isMyPath(path)) > return; > try { > loadNode(); // calls zk.getData(String, Watcher, Stat) > } catch (final Exception e) { > // got NoNodeException here (but not when debugging) > log(..., e) > } > } > > > > From inspecting the logs we noticed a NoNodeException. When setting > breakpoints on #loadNode and stepping through we don't get the > exception. But when setting a breakpoint on #log only we got a hit and > could confirm the issue this way. > > The path is actually some levels deep. All the parent paths don't exist > either so they are created as well. However, no exception is thrown fro > them. The sequence is as follows. > > /l1 --> watch triggered, getData, no exception > /l1/l2 --> watch triggered, getData, no exception > /l1/l2/l3 --> watch triggered, getData, no exception > /l1/l2/l3/l4 --> watch triggered, getData, no exception > /l1/l2/l3/l4/l5 --> watch triggered, getData, no exception > /l1/l2/l3/l4/l5/l6 --> watch triggered, getData, NoNodeException > > The only difference is that all paths up to including l5 do not actually > have any data. Only l6 has some data. Could there be some latency issues? > > For completeness, the embedded server is started as follows. > > // disable LOG4J JMX stuff > System.setProperty("zookeeper.jmx.log4j.disable", Boolean.TRUE.toString()); > > // get directories > final File dataDir = new File(config.getDataLogDir()); > final File snapDir = new File(config.getDataDir()); > > // clean old logs > PurgeTxnLog.purge(dataDir, snapDir, 3); > > // create standalone server > zkServer = new ZooKeeperServer(); > zkServer.setTxnLogFactory(new FileTxnSnapLog(dataDir, snapDir)); > zkServer.setTickTime(config.getTickTime()); > zkServer.setMinSessionTimeout(config.getMinSessionTimeout()); > zkServer.setMaxSessionTimeout(config.getMaxSessionTimeout()); > > factory = new NIOServerCnxn.Factory(config.getClientPortAddress(), > config.getMaxClientCnxns()); > > // start server > LOG.info("Starting ZooKeeper standalone server."); > try { > factory.startup(zkServer); > } catch (final InterruptedException e) { > LOG.warn("Interrupted during server start.", e); > Thread.currentThread().interrupt(); > } > > > -Gunnar > > > -- > Gunnar Wagenknecht > [EMAIL PROTECTED] > http://wagenknecht.org/ > >
-
Re: Timing issue when running embedded?Gunnar Wagenknecht 2011-05-20, 06:57
Am 19.05.2011 21:14, schrieb Patrick Hunt:
> I'd also suggest creating a JIRA to track this issue: > https://issues.apache.org/jira/browse/ZOOKEEPER Filed as https://issues.apache.org/jira/browse/ZOOKEEPER-1065 I also attached a debug log there. Thanks, Gunnar -- Gunnar Wagenknecht [EMAIL PROTECTED] http://wagenknecht.org/
-
Re: Timing issue when running embedded?Patrick Hunt 2011-05-20, 17:23
Hi Gunnar, I believe I see something from the logs:
1) goto line 190 in the log, time index 09:00:59.357 2) search backward in the log file for the znode you are getting "nonode" on, this: /gyrex/prefs/cloud/org.eclipse.gyrex.http/applications/org.eclipse.gyrex.http.equinoxhttpservice.application 3) time index is now 09:00:58.968 (the most recent earlier operation on this znode), notice the operation being performed on the znode - it's explicitly being deleted by the client. This (3) is the cause of the log message from (1). So from what I see in the log it looks like expected behavior from the ZK view of the world. Does this make sense to you or am I missing something? Regards, Patrick On Thu, May 19, 2011 at 11:57 PM, Gunnar Wagenknecht <[EMAIL PROTECTED]> wrote: > Am 19.05.2011 21:14, schrieb Patrick Hunt: >> I'd also suggest creating a JIRA to track this issue: >> https://issues.apache.org/jira/browse/ZOOKEEPER > > Filed as > https://issues.apache.org/jira/browse/ZOOKEEPER-1065 > > I also attached a debug log there. > > Thanks, > Gunnar > > -- > Gunnar Wagenknecht > [EMAIL PROTECTED] > http://wagenknecht.org/ > > |