Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> reduce network problem after using cache dns

Copy link to this message
Re: reduce network problem after using cache dns
I'm not sure if java is using the system's libc resolver, but assuming it is, you cannot use utilities like nslookup or dig because their use their own resolver.  Ping usually uses the libc resolver.  If you are on linux, you can use "getent hosts $hostname" to definitively test the libc resolver.

If you really do want to use mdns hosts (ie. end in ".local"), then you must have nss_mdns installed on your system and configure /etc/nsswitch.conf to use it.  You may also want to consider using nscd to cache dns lookup.  Although if you are using mdns, due to its dynamic nature, you may not want to cache (especially negative lookups) very long unless the host is assigned a static ip.

I hope this helps.

On Jan 4, 2012, at 10:53 AM, Alexander Lorenz wrote:

> Hi,
> Please ping the host you want to reach and check your hosts-file and your resolve.conf
> - Alex
> Alexander Lorenz
> http://mapredit.blogspot.com
> On Jan 4, 2012, at 7:28 AM, Oren <[EMAIL PROTECTED]> wrote:
>> so it seems but doing a dig from terminal command line returns the results correctly.
>> the same setting are running in production servers (not hadoop) for months without problems.
>> clarification - i changed servers names in logs, domain isn't xxx.local originally..
>> On 01/04/2012 05:19 PM, Harsh J wrote:
>>> Looks like your caching DNS servers aren't really functioning as you'd
>>> expect them to?
>>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>>> java.net.UnknownHostException: s06.xxx.local
>>> (That .local also worries me, you probably have a misconfiguration in
>>> resolution somewhere.)
>>> On Wed, Jan 4, 2012 at 8:38 PM, Oren<[EMAIL PROTECTED]>  wrote:
>>>> hi.
>>>> i have a small hadoop grid connected  with a 1g network.
>>>> when servers are configured to use the local dns server the jobs are running
>>>> without a problem and copy speed during reduce is tens on MB.
>>>> once i change the servers to work with a cache only named server on each
>>>> node, i start to get failed tasks with timeout errors.
>>>> also, copy speed is reduced to under 1M.
>>>> there is NO degradation in network, copy of files between servers is still
>>>> tens of MB.
>>>> resolving is working ok and in the same speed (give or take) with both
>>>> configurations.
>>>> any idea of what happens during the map/reduce process that causes this
>>>> behavior?
>>>> this is an example for the exceptions i get during map:
>>>> Too many fetch-failures
>>>> and during reduce:
>>>> java.lang.RuntimeException:
>>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>>> java.net.UnknownHostException: s06.xxx.local at
>>>> org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38)
>>>> at
>>>> org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129)
>>>> at org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) at
>>>> com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118)
>>>> at com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71)
>>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at
>>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at
>>>> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at
>>>> org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by:
>>>> org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>>> java.net.UnknownHostException: s06.xxx.local at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294)
>>>> at
>>>> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)