Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - 5 seconds pause connecting to ZK in unit tests


+
Jean-Daniel Cryans 2012-11-07, 18:13
+
Jean-Daniel Cryans 2012-11-07, 18:18
+
Kevin Odell 2012-11-07, 18:22
+
Jean-Daniel Cryans 2012-11-07, 18:28
Copy link to this message
-
Re: 5 seconds pause connecting to ZK in unit tests
Andrew Purtell 2012-11-07, 18:33
Or Hudsons/Jenkins running the HBase test suite.
On Wed, Nov 7, 2012 at 10:28 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> I was reading that it's impossible to do in OSX but totally forgot
> about -Djava.net.preferIPv4Stack=true. And it does fix the issue.
>
> I wonder how much this is affecting users.
>
> J-D
>
> On Wed, Nov 7, 2012 at 10:22 AM, Kevin O'dell <[EMAIL PROTECTED]>
> wrote:
> > Can you disable IPv6 on that machine and see if it will connect over IPv4
> > or set the java setting(don't remember off the top of my head) to prefer
> > IPv4?
> >
> > On Wed, Nov 7, 2012 at 1:18 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> >wrote:
> >
> >> Or if I use the shell on my machine:
> >>
> >> 12/11/07 10:17:15 INFO zookeeper.RecoverableZooKeeper: Running exists
> >> on /hbase/master
> >> 12/11/07 10:17:20 INFO zookeeper.ClientCnxn: Socket connection
> >> established to fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:2181,
> >> initiating session
> >> 12/11/07 10:17:20 DEBUG zookeeper.ClientCnxn: Session establishment
> >> request sent on fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:2181
> >> 12/11/07 10:17:20 INFO zookeeper.ClientCnxn: Session establishment
> >> complete on server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:2181,
> >> sessionid = 0x13adc16563a0003, negotiated timeout = 40000
> >> 12/11/07 10:17:20 DEBUG zookeeper.ClientCnxn: Reading reply
> >> sessionid:0x13adc16563a0003, packet:: clientPath:null serverPath:null
> >> finished:false header:: 1,3  replyHeader:: 1,36,0  request::
> >> '/hbase/master,T  response::
> >> s{10,10,1352312182945,1352312182945,0,0,0,88625131181178880,73,0,10}
> >> 12/11/07 10:17:20 INFO zookeeper.RecoverableZooKeeper: Done with
> >> exists on /hbase/master
> >>
> >> This sounds DNS-y...
> >>
> >> On Wed, Nov 7, 2012 at 10:13 AM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]>
> >> wrote:
> >> > Hey devs,
> >> >
> >> > I'm wondering if anyone else saw this. I was running TestReplication
> >> > on my machine and I saw it failing in testDeleteTypes trying to
> >> > replicate _3_ rows:
> >> >
> >> > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 59.62
> >> > sec <<< FAILURE!
> >> > testDeleteTypes(org.apache.hadoop.hbase.replication.TestReplication)
> >> > Time elapsed: 0.081 sec  <<< FAILURE!
> >> > java.lang.AssertionError: Waited too much time for put replication
> >> >         at org.junit.Assert.fail(Assert.java:93)
> >> >         at
> >>
> org.apache.hadoop.hbase.replication.TestReplication.testDeleteTypes(TestReplication.java:242)
> >> >
> >> > Digging down, I was able to jstack while the test is waiting for
> >> > replication and I see:
> >> >
> >> > Thread[REPL IPC Server handler 0 on 62520,5,main]
> >> >         at java.lang.Object.wait(Native Method)
> >> >         at java.lang.Object.wait(Object.java:485)
> >> >         at
> >> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1259)
> >> >         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:998)
> >> >         at
> >>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:171)
> >> >
> >> > It stays there the whole time. Adding debug around exists I see:
> >> >
> >> > 2012-11-07 10:01:15,759 INFO  [REPL IPC Server handler 0 on 62520]
> >> > zookeeper.RecoverableZooKeeper(170): Running exists on /2/master
> >> > 2012-11-07 10:01:20,244 INFO  [REPL IPC Server handler 0 on 62520]
> >> > zookeeper.RecoverableZooKeeper(184): Done with exists on /2/master
> >> >
> >> > If I turn on debug for ZK I see that ZK doesn't process the exists
> >> > until a few ms before the call returns.
> >> >
> >> > FWIW I see other exits() hanging for about 5 seconds in the log, and
> >> > it always seems to happen on the first call after the session is
> >> > started.
> >> >
> >> > Elliott saw the same issue with his machine yesterday but I'm
> >> > wondering if anyone else saw it and/or have a clue.
> >> >
> >> > Thx,
> >> >
> >> > J-D
> >>
> >
> >
> >
> > --
> > Kevin O'Dell

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
+
Jean-Daniel Cryans 2012-11-07, 18:53