|
|
Marshall McMullen 2012-05-22, 04:31
In our Linux environment, we're using IP addresses only for all our zookeeper servers. We've observed that without a functioning DNS server, zookeeper peers cannot communicate with one another. We have been able to work around this in the past by putting entries in /etc/hosts for all the zookeeper servers. With entries in /etc/hosts no reverse name lookup is performed and everything works fine.
Has anyone else seen this behavior or can confirm/deny whether zookeeper requires (assumes) a functioning DNS server.. ?
I've gone through a lot of the quorum code related to IP addresses, and I thought the culprit might be calls to InetAddress.getByName. But looking at the source code for that (at least in openjdk) they return if the given string is an actual IP address. Other thoughts I had were calls to InetSocketAddress(hostname, port), but that looks like it similarly goes through InetAddress so that should be OK.
Anyhow, I'll keep digging into this, but any ideas or help would be appreciated!
-
Re: reverse name lookup?
Flavio Junqueira 2012-05-22, 07:19
Are they able to elect a leader or not even that?
-Flavio
On May 22, 2012, at 6:31 AM, Marshall McMullen wrote:
> In our Linux environment, we're using IP addresses only for all our > zookeeper servers. We've observed that without a functioning DNS server, > zookeeper peers cannot communicate with one another. We have been able to > work around this in the past by putting entries in /etc/hosts for all the > zookeeper servers. With entries in /etc/hosts no reverse name lookup is > performed and everything works fine. > > Has anyone else seen this behavior or can confirm/deny whether zookeeper > requires (assumes) a functioning DNS server.. ? > > I've gone through a lot of the quorum code related to IP addresses, and I > thought the culprit might be calls to InetAddress.getByName. But looking at > the source code for that (at least in openjdk) they return if the given > string is an actual IP address. Other thoughts I had were calls to > InetSocketAddress(hostname, port), but that looks like it similarly goes > through InetAddress so that should be OK. > > Anyhow, I'll keep digging into this, but any ideas or help would be > appreciated!
-
Re: reverse name lookup?
Marshall McMullen 2012-05-23, 15:02
Sorry, had jury duty yesterday so wasn't able to respond until now....
No, leader election fails as well. What's interesting is it looks like the bind of the leader socket/port itself is never completing. I enable tracing, and without a valid DNS server running, all zookeeper servers just hang at startup. In the trace file, the last thing printed in every log file before the hang is:
2012-05-23 08:49:29,882 [myid:1] - INFO [main:QuorumPeerMain@131][] - Starting quorum peer 2012-05-23 08:49:29,893 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] - binding to port /127.0.0.2:2181 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1107][] - tickTime set to 2000 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1127][] - minSessionTimeout set to -1 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1138][] - maxSessionTimeout set to -1 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1153][] - initLimit set to 10 Now, if I repeat the test with a functioning DNS server running, it churns along as expected:
2012-05-23 08:58:03,468 [myid:1] - INFO [main:QuorumPeerMain@131][] - Starting quorum peer 2012-05-23 08:58:03,479 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] - binding to port /127.0.0.2:2181 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1107][] - tickTime set to 2000 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1127][] - minSessionTimeout set to -1 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1138][] - maxSessionTimeout set to -1 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1153][] - initLimit set to 10 2012-05-23 08:58:03,665 [myid:1] - INFO [main:QuorumPeer@620][] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation 2012-05-23 08:58:03,666 [myid:1] - INFO [main:QuorumPeer@635][] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation 2012-05-23 08:58:03,670 [myid:1] - INFO [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxnFactory@227][] - Accepted socket connection from /127.0.0.1:54763 2012-05-23 08:58:03,672 [myid:1] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind port: /127.0.0.2:2183 2012-05-23 08:58:03,675 [myid:1] - WARN [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@354][] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2012-05-23 08:58:03,675 [myid:1] - DEBUG [QuorumPeer[myid=1]/127.0.0.2:2181 :QuorumPeer@825][] - Starting quorum peer 2012-05-23 08:58:03,675 [myid:1] - DEBUG [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@358][] - IOException stack trace
So the leader socket/port bind appears to hang indefinitely without DNS. As I mentioned before, we are using IP addresses only and in our customer environment we will not always have DNS so we'd really like to remove this requirement...Anyone have ideas where I can start looking to figure this out? On Tue, May 22, 2012 at 1:19 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:
> Are they able to elect a leader or not even that? > > -Flavio > > On May 22, 2012, at 6:31 AM, Marshall McMullen wrote: > > > In our Linux environment, we're using IP addresses only for all our > > zookeeper servers. We've observed that without a functioning DNS server, > > zookeeper peers cannot communicate with one another. We have been able to > > work around this in the past by putting entries in /etc/hosts for all the > > zookeeper servers. With entries in /etc/hosts no reverse name lookup is > > performed and everything works fine. > > > > Has anyone else seen this behavior or can confirm/deny whether zookeeper > > requires (assumes) a functioning DNS server.. ? > > > > I've gone through a lot of the quorum code related to IP addresses, and I > > thought the culprit might be calls to InetAddress.getByName. But looking > at > > the source code for that (at least in openjdk) they return if the given
-
Re: reverse name lookup?
Marshall McMullen 2012-05-23, 15:09
OK, I think I may have found the culprit.
There are lots of places where we call InetSocketAddress.getHostName(). The documentation on this is worthless, but looking at the openjdk source code, getHostName absolutely triggers a reverse DNS lookup.
I'm going to try modifying these to just use the toString function (which doesn't do the lookup) and see if I get past this problem.
I'll update with progress.
On Wed, May 23, 2012 at 9:02 AM, Marshall McMullen < [EMAIL PROTECTED]> wrote:
> Sorry, had jury duty yesterday so wasn't able to respond until now.... > > No, leader election fails as well. What's interesting is it looks like the > bind of the leader socket/port itself is never completing. I enable > tracing, and without a valid DNS server running, all zookeeper servers just > hang at startup. In the trace file, the last thing printed in every log > file before the hang is: > > 2012-05-23 08:49:29,882 [myid:1] - INFO [main:QuorumPeerMain@131][] - > Starting quorum peer > 2012-05-23 08:49:29,893 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] > - binding to port /127.0.0.2:2181 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1107][] - > tickTime set to 2000 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1127][] - > minSessionTimeout set to -1 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1138][] - > maxSessionTimeout set to -1 > 2012-05-23 08:49:29,899 [myid:1] - INFO [main:QuorumPeer@1153][] - > initLimit set to 10 > > > Now, if I repeat the test with a functioning DNS server running, it churns > along as expected: > > 2012-05-23 08:58:03,468 [myid:1] - INFO [main:QuorumPeerMain@131][] - > Starting quorum peer > 2012-05-23 08:58:03,479 [myid:1] - INFO [main:NIOServerCnxnFactory@108][] > - binding to port /127.0.0.2:2181 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1107][] - > tickTime set to 2000 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1127][] - > minSessionTimeout set to -1 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1138][] - > maxSessionTimeout set to -1 > 2012-05-23 08:58:03,485 [myid:1] - INFO [main:QuorumPeer@1153][] - > initLimit set to 10 > 2012-05-23 08:58:03,665 [myid:1] - INFO [main:QuorumPeer@620][] - > currentEpoch not found! Creating with a reasonable default of 0. This > should only happen when you are upgrading your installation > 2012-05-23 08:58:03,666 [myid:1] - INFO [main:QuorumPeer@635][] - > acceptedEpoch not found! Creating with a reasonable default of 0. This > should only happen when you are upgrading your installation > 2012-05-23 08:58:03,670 [myid:1] - INFO > [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxnFactory@227][] - > Accepted socket connection from /127.0.0.1:54763 > 2012-05-23 08:58:03,672 [myid:1] - INFO > [QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind > port: /127.0.0.2:2183 > 2012-05-23 08:58:03,675 [myid:1] - WARN > [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@354][] - Exception > causing close of session 0x0 due to java.io.IOException: ZooKeeperServer > not running > 2012-05-23 08:58:03,675 [myid:1] - DEBUG [QuorumPeer[myid=1]/127.0.0.2:2181 > :QuorumPeer@825][] - Starting quorum peer > 2012-05-23 08:58:03,675 [myid:1] - DEBUG > [NIOServerCxn.Factory:/127.0.0.2:2181:NIOServerCnxn@358][] - IOException > stack trace > > So the leader socket/port bind appears to hang indefinitely without DNS. > As I mentioned before, we are using IP addresses only and in our customer > environment we will not always have DNS so we'd really like to remove this > requirement...Anyone have ideas where I can start looking to figure this > out? > > > On Tue, May 22, 2012 at 1:19 AM, Flavio Junqueira <[EMAIL PROTECTED]>wrote: > >> Are they able to elect a leader or not even that? >> >> -Flavio >> >> On May 22, 2012, at 6:31 AM, Marshall McMullen wrote: >> >> > In our Linux environment, we're using IP addresses only for all our >> > zookeeper servers. We've observed that without a functioning DNS server,
|
|