Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> Distributed mode troubles: ZK/Curator connection time out


Copy link to this message
-
Re: Distributed mode troubles: ZK/Curator connection time out
One thing to add to the diagram is that all of the drill java processes
will look at what is in drill-override.conf. You must set zk.connect to the
correct zk host:port.
On Sun, Oct 27, 2013 at 2:00 PM, Michael Hausenblas <
[EMAIL PROTECTED]> wrote:

>
> Folks,
>
> I’m trying to set up Drill in distributed mode. Here’s what I have so far:
> when I launch the first Drillbit with bin/drillbit.sh I get the following
> in log/drillbit.out:
>
> [[
> 20:47:20.963 [main] ERROR com.netflix.curator.ConnectionState - Connection
> timed out for connection string (localhost:2181) and timeout (5000) /
> elapsed (5045)
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss
>         at
> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
> ~[curator-client-1.1.9.jar:na]
>         at
> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
> [curator-client-1.1.9.jar:na]
>         at
> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:393)
> [curator-framework-1.1.9.jar:na]
>         at
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
> [curator-framework-1.1.9.jar:na]
>         at
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
> [curator-framework-1.1.9.jar:na]
>         at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
> [curator-client-1.1.9.jar:na]
>         at
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
> [curator-framework-1.1.9.jar:na]
>         at
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
> [curator-framework-1.1.9.jar:na]
>         at
> com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
> [curator-framework-1.1.9.jar:na]
>         at
> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.getChildrenWatched(ServiceDiscoveryImpl.java:306)
> [curator-x-discovery-1.1.9.jar:na]
>         at
> com.netflix.curator.x.discovery.details.ServiceDiscoveryImpl.queryForInstances(ServiceDiscoveryImpl.java:276)
> [curator-x-discovery-1.1.9.jar:na]
>         at
> com.netflix.curator.x.discovery.details.ServiceCache.refresh(ServiceCache.java:193)
> [curator-x-discovery-1.1.9.jar:na]
>         at
> com.netflix.curator.x.discovery.details.ServiceCache.start(ServiceCache.java:116)
> [curator-x-discovery-1.1.9.jar:na]
>         at
> org.apache.drill.exec.coord.ZKClusterCoordinator.start(ZKClusterCoordinator.java:89)
> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:94)
> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:56)
> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:43)
> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
>         at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:65)
> [java-exec-1.0.0-m1-rebuffed.jar:1.0.0-m1]
> ]]
>
> This seems to be a known issue? See
> http://stackoverflow.com/questions/16056751/curator-zookeeper-client-keeps-throw-out-connectionlossexception-per-connection
>
> Any ideas? Did anyone actually run Drill in distributed mode already and
> if so, how did you overcome the above issue?
>
> What is next? How do I make other Drillbits point to the same ZK cluster?
> And has anyone an example of the call parameters for bin/submit_plan maybe
> as well?
>
>
> BTW, in the process of trying to figure what’s going on behind the scene I
> traced down the startup call dependencies (scripts), available via:
>
>
> https://docs.google.com/drawings/d/1-ADIGJ-lBr-dOrOjMpQlProiZjYjjuM0kR6A81BYwKA/edit?usp=sharing
>
> which we could then also use for documentation purposes.
>
>
> Cheers,
>                 Michael
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB