Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # dev - reverse name lookup?


Copy link to this message
-
Re: reverse name lookup?
Marshall McMullen 2012-05-23, 15:02
Sorry, had jury duty yesterday so wasn't able to respond until now....

No, leader election fails as well. What's interesting is it looks like the
bind of the leader socket/port itself is never completing. I enable
tracing, and without a valid DNS server running, all zookeeper servers just
hang at startup. In the trace file, the last thing printed in every log
file before the hang is:

2012-05-23 08:49:29,882 [myid:1] - INFO  [main:QuorumPeerMain@131][] -
Starting quorum peer
2012-05-23 08:49:29,893 [myid:1] - INFO  [main:NIOServerCnxnFactory@108][]
- binding to port /127.0.0.2:2181
2012-05-23 08:49:29,899 [myid:1] - INFO  [main:QuorumPeer@1107][] -
tickTime set to 2000
2012-05-23 08:49:29,899 [myid:1] - INFO  [main:QuorumPeer@1127][] -
minSessionTimeout set to -1
2012-05-23 08:49:29,899 [myid:1] - INFO  [main:QuorumPeer@1138][] -
maxSessionTimeout set to -1
2012-05-23 08:49:29,899 [myid:1] - INFO  [main:QuorumPeer@1153][] -
initLimit set to 10
Now, if I repeat the test with a functioning DNS server running, it churns
along as expected:

2012-05-23 08:58:03,468 [myid:1] - INFO  [main:QuorumPeerMain@131][] -
Starting quorum peer
2012-05-23 08:58:03,479 [myid:1] - INFO  [main:NIOServerCnxnFactory@108][]
- binding to port /127.0.0.2:2181
2012-05-23 08:58:03,485 [myid:1] - INFO  [main:QuorumPeer@1107][] -
tickTime set to 2000
2012-05-23 08:58:03,485 [myid:1] - INFO  [main:QuorumPeer@1127][] -
minSessionTimeout set to -1
2012-05-23 08:58:03,485 [myid:1] - INFO  [main:QuorumPeer@1138][] -
maxSessionTimeout set to -1
2012-05-23 08:58:03,485 [myid:1] - INFO  [main:QuorumPeer@1153][] -
initLimit set to 10
2012-05-23 08:58:03,665 [myid:1] - INFO  [main:QuorumPeer@620][] -
currentEpoch not found! Creating with a reasonable default of 0. This
should only happen when you are upgrading your installation
2012-05-23 08:58:03,666 [myid:1] - INFO  [main:QuorumPeer@635][] -
acceptedEpoch not found! Creating with a reasonable default of 0. This
should only happen when you are upgrading your installation
2012-05-23 08:58:03,670 [myid:1] - INFO
 [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxnFactory@227][] -
Accepted socket connection from /127.0.0.1:54763
2012-05-23 08:58:03,672 [myid:1] - INFO
 [QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind
port: /127.0.0.2:2183
2012-05-23 08:58:03,675 [myid:1] - WARN
 [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@354][] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer
not running
2012-05-23 08:58:03,675 [myid:1] - DEBUG [QuorumPeer[myid=1]/127.0.0.2:2181
:QuorumPeer@825][] - Starting quorum peer
2012-05-23 08:58:03,675 [myid:1] - DEBUG
[NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@358][] - IOException
stack trace

So the leader socket/port bind appears to hang indefinitely without DNS. As
I mentioned before, we are using IP addresses only and in our customer
environment we will not always have DNS so we'd really like to remove this
requirement...Anyone have ideas where I can start looking to figure this
out?
On Tue, May 22, 2012 at 1:19 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:

> Are they able to elect a leader or not even that?
>
> -Flavio
>
> On May 22, 2012, at 6:31 AM, Marshall McMullen wrote:
>
> > In our Linux environment, we're using IP addresses only for all our
> > zookeeper servers. We've observed that without a functioning DNS server,
> > zookeeper peers cannot communicate with one another. We have been able to
> > work around this in the past by putting entries in /etc/hosts for all the
> > zookeeper servers. With entries in /etc/hosts no reverse name lookup is
> > performed and everything works fine.
> >
> > Has anyone else seen this behavior or can confirm/deny whether zookeeper
> > requires (assumes) a functioning DNS server.. ?
> >
> > I've gone through a lot of the quorum code related to IP addresses, and I
> > thought the culprit might be calls to InetAddress.getByName. But looking
> at
> > the source code for that (at least in openjdk) they return if the given