Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Data Nodes not seeing NameNode / Task Trackers not seeing JobTracker


Copy link to this message
-
Re: Data Nodes not seeing NameNode / Task Trackers not seeing JobTracker
Ronan,

A couple of simple things to ensure first:

1. Make sure the firewall isn't the one at fault here. Best to disable
firewall if you do not need it, or carefully configure the rules to
allow in/out traffic over chosen ports.
2. Ensure that the hostnames fs.default.name and mapred.job.tracker
bind to, are external IP-resolving hostnames and not localhost
(loopback interface bound) addresses.

On Tue, Jul 17, 2012 at 12:05 AM, Ronan Lehane <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I was wondering if anyone could help me figure out what's going wrong in my
> five node Hadoop cluster, please?
>
> It consists of:
> 1. NameNode
> hduser@namenode:/usr/local/hadoop$ jps
> 13049 DataNode
> 13387 Jps
> 12740 NameNode
> 13316 SecondaryNameNode
>
> 2. JobTracker
> hduser@jobtracker:/usr/local/hadoop$ jps
> 21817 TaskTracker
> 21448 DataNode
> 21542 JobTracker
> 21862 Jps
>
> 3. Slave1
> hduser@slave1:/usr/local/hadoop$ jps
> 21226 DataNode
> 21514 Jps
> 21463 TaskTracker
>
> 4. Slave2
> hduser@slave2:/usr/local/hadoop$ jps
> 20938 Jps
> 20650 DataNode
> 20887 TaskTracker
>
> 5. Slave3
> hduser@slave3:/usr/local/hadoop$ jps
> 22145 Jps
> 21854 DataNode
> 22091 TaskTracker
>
> All DataNodes have been kicked off by running start-dfs.sh on the NameNode
> All TaskTrackers have been kicked off by running start-mapred.sh on the
> JobTracker
>
> When I try to execute a simple wordcount job from the NameNode I receive
> the following error:
> 12/07/16 19:25:22 ERROR security.UserGroupInformation:
> PriviledgedActionException as:hduser cause:java.net.ConnectException: Call
> to jobtracker/10.21.68.218:54311 failed on connection exception:
> java.net.ConnectException: Connection refused
>
> If I check the jobtracker:
> 1. I can ping in both directions by both IP and Hostname
> 2. I can see that the jobtracker is listening on port 54311
> tcp        0      0 127.0.0.1:54311         0.0.0.0:*
> LISTEN      1001       425093      21542/java
> 3. Telnet to this port from the NameNode fails with "Connection Refused"
> telnet: Unable to connect to remote host: Connection refused
>
> This issue can be worked around by moving the JobTracker functionality to
> the NameNode, but when this is done the job is executed on the NameNode
> rather than distributed across the cluster.
> Checking the log files on the slaves nodes, I see Server Not Available
> messages referenced at the below wiki.
> http://wiki.apache.org/hadoop/ServerNotAvailable
> The Data Nodes not seeing the NameNode and the Task Trackers not seeing
> JobTracker.
> Checking the JobTracker web interface, it always states there is only 1
> node available.
>
> I've checked the 5 troubleshooting steps provided but it all looks to be ok
> in my environment.
>
> Would anyone have any idea's of what could be causing this?
> Any help would be appreciated.
>
> Cheers,
> Ronan

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB