Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> Distributed mode troubles: ZK/Curator connection time out


Copy link to this message
-
Distributed mode troubles: ZK/Curator connection time out

Folks,

I’m trying to set up Drill in distributed mode. Here’s what I have so far: when I launch the first Drillbit with bin/drillbit.sh I get the following in log/drillbit.out:

[[
20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState - Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (5045)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) ~[curator-client-1.1.9.jar:na]
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106) [curator-client-1.1.9.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393) [curator-framework-1.1.9.jar:na]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184) [curator-framework-1.1.9.jar:na]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173) [curator-framework-1.1.9.jar:na]
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85) [curator-client-1.1.9.jar:na]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169) [curator-framework-1.1.9.jar:na]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161) [curator-framework-1.1.9.jar:na]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36) [curator-framework-1.1.9.jar:na]
at com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306) [curator-x-discovery-1.1.9.jar:na]
at com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276) [curator-x-discovery-1.1.9.jar:na]
at com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193) [curator-x-discovery-1.1.9.jar:na]
at com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116) [curator-x-discovery-1.1.9.jar:na]
at org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65) [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
]]

This seems to be a known issue? See http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection

Any ideas? Did anyone actually run Drill in distributed mode already and if so, how did you overcome the above issue?

What is next? How do I make other Drillbits point to the same ZK cluster? And has anyone an example of the call parameters for bin/submit_plan maybe as well?
BTW, in the process of trying to figure what’s going on behind the scene I traced down the startup call dependencies (scripts), available via:

  https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing

which we could then also use for documentation purposes.
Cheers,
Michael

--
Michael Hausenblas
Ireland, Europe
http://mhausenblas.info/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB