-Re: Data Nodes not seeing NameNode / Task Trackers not seeing JobTracker
Ronan Lehane 2012-07-16, 20:22
Thanks for the quick reply Harsh.
I think you may be onto something with the second suggestion.
I found an earlier thread saying that some of the troubleshooting steps
outlined below resolved a similar issue for that person:
Like you suggested, the /etc/hosts file definitely looks to be involved as
I hit different issues depending on what hostnames are set against the
I'll try reset them to see if it resolves the issue.
On Mon, Jul 16, 2012 at 7:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> A couple of simple things to ensure first:
> 1. Make sure the firewall isn't the one at fault here. Best to disable
> firewall if you do not need it, or carefully configure the rules to
> allow in/out traffic over chosen ports.
> 2. Ensure that the hostnames fs.default.name and mapred.job.tracker
> bind to, are external IP-resolving hostnames and not localhost
> (loopback interface bound) addresses.
> On Tue, Jul 17, 2012 at 12:05 AM, Ronan Lehane <[EMAIL PROTECTED]>
> > Hi All,
> > I was wondering if anyone could help me figure out what's going wrong in
> > five node Hadoop cluster, please?
> > It consists of:
> > 1. NameNode
> > hduser@namenode:/usr/local/hadoop$ jps
> > 13049 DataNode
> > 13387 Jps
> > 12740 NameNode
> > 13316 SecondaryNameNode
> > 2. JobTracker
> > hduser@jobtracker:/usr/local/hadoop$ jps
> > 21817 TaskTracker
> > 21448 DataNode
> > 21542 JobTracker
> > 21862 Jps
> > 3. Slave1
> > hduser@slave1:/usr/local/hadoop$ jps
> > 21226 DataNode
> > 21514 Jps
> > 21463 TaskTracker
> > 4. Slave2
> > hduser@slave2:/usr/local/hadoop$ jps
> > 20938 Jps
> > 20650 DataNode
> > 20887 TaskTracker
> > 5. Slave3
> > hduser@slave3:/usr/local/hadoop$ jps
> > 22145 Jps
> > 21854 DataNode
> > 22091 TaskTracker
> > All DataNodes have been kicked off by running start-dfs.sh on the
> > All TaskTrackers have been kicked off by running start-mapred.sh on the
> > JobTracker
> > When I try to execute a simple wordcount job from the NameNode I receive
> > the following error:
> > 12/07/16 19:25:22 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:hduser cause:java.net.ConnectException:
> > to jobtracker/10.21.68.218:54311 failed on connection exception:
> > java.net.ConnectException: Connection refused
> > If I check the jobtracker:
> > 1. I can ping in both directions by both IP and Hostname
> > 2. I can see that the jobtracker is listening on port 54311
> > tcp 0 0 127.0.0.1:54311 0.0.0.0:*
> > LISTEN 1001 425093 21542/java
> > 3. Telnet to this port from the NameNode fails with "Connection Refused"
> > telnet: Unable to connect to remote host: Connection refused
> > This issue can be worked around by moving the JobTracker functionality to
> > the NameNode, but when this is done the job is executed on the NameNode
> > rather than distributed across the cluster.
> > Checking the log files on the slaves nodes, I see Server Not Available
> > messages referenced at the below wiki.
> > http://wiki.apache.org/hadoop/ServerNotAvailable
> > The Data Nodes not seeing the NameNode and the Task Trackers not seeing
> > JobTracker.
> > Checking the JobTracker web interface, it always states there is only 1
> > node available.
> > I've checked the 5 troubleshooting steps provided but it all looks to be
> > in my environment.
> > Would anyone have any idea's of what could be causing this?
> > Any help would be appreciated.
> > Cheers,
> > Ronan
> Harsh J