-Re: Data Nodes not seeing NameNode / Task Trackers not seeing JobTracker
A couple of simple things to ensure first:
1. Make sure the firewall isn't the one at fault here. Best to disable
firewall if you do not need it, or carefully configure the rules to
allow in/out traffic over chosen ports.
2. Ensure that the hostnames fs.default.name and mapred.job.tracker
bind to, are external IP-resolving hostnames and not localhost
(loopback interface bound) addresses.
On Tue, Jul 17, 2012 at 12:05 AM, Ronan Lehane <[EMAIL PROTECTED]> wrote:
> Hi All,
> I was wondering if anyone could help me figure out what's going wrong in my
> five node Hadoop cluster, please?
> It consists of:
> 1. NameNode
> hduser@namenode:/usr/local/hadoop$ jps
> 13049 DataNode
> 13387 Jps
> 12740 NameNode
> 13316 SecondaryNameNode
> 2. JobTracker
> hduser@jobtracker:/usr/local/hadoop$ jps
> 21817 TaskTracker
> 21448 DataNode
> 21542 JobTracker
> 21862 Jps
> 3. Slave1
> hduser@slave1:/usr/local/hadoop$ jps
> 21226 DataNode
> 21514 Jps
> 21463 TaskTracker
> 4. Slave2
> hduser@slave2:/usr/local/hadoop$ jps
> 20938 Jps
> 20650 DataNode
> 20887 TaskTracker
> 5. Slave3
> hduser@slave3:/usr/local/hadoop$ jps
> 22145 Jps
> 21854 DataNode
> 22091 TaskTracker
> All DataNodes have been kicked off by running start-dfs.sh on the NameNode
> All TaskTrackers have been kicked off by running start-mapred.sh on the
> When I try to execute a simple wordcount job from the NameNode I receive
> the following error:
> 12/07/16 19:25:22 ERROR security.UserGroupInformation:
> PriviledgedActionException as:hduser cause:java.net.ConnectException: Call
> to jobtracker/10.21.68.218:54311 failed on connection exception:
> java.net.ConnectException: Connection refused
> If I check the jobtracker:
> 1. I can ping in both directions by both IP and Hostname
> 2. I can see that the jobtracker is listening on port 54311
> tcp 0 0 127.0.0.1:54311 0.0.0.0:*
> LISTEN 1001 425093 21542/java
> 3. Telnet to this port from the NameNode fails with "Connection Refused"
> telnet: Unable to connect to remote host: Connection refused
> This issue can be worked around by moving the JobTracker functionality to
> the NameNode, but when this is done the job is executed on the NameNode
> rather than distributed across the cluster.
> Checking the log files on the slaves nodes, I see Server Not Available
> messages referenced at the below wiki.
> The Data Nodes not seeing the NameNode and the Task Trackers not seeing
> Checking the JobTracker web interface, it always states there is only 1
> node available.
> I've checked the 5 troubleshooting steps provided but it all looks to be ok
> in my environment.
> Would anyone have any idea's of what could be causing this?
> Any help would be appreciated.