Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Data Nodes not seeing NameNode / Task Trackers not seeing JobTracker


Copy link to this message
-
Re: Data Nodes not seeing NameNode / Task Trackers not seeing JobTracker
Ronan Lehane 2012-07-16, 20:22
Thanks for the quick reply Harsh.
I think you may be onto something with the second suggestion.

I found an earlier thread saying that some of the troubleshooting steps
outlined below resolved a similar issue for that person:
http://wiki.apache.org/hadoop/Hbase/Troubleshooting

Like you suggested, the /etc/hosts file definitely looks to be involved as
I hit different issues depending on what hostnames are set against the
loopback addresses.
I'll try reset them to see if it resolves the issue.

Thanks,
Ronan
On Mon, Jul 16, 2012 at 7:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Ronan,
>
> A couple of simple things to ensure first:
>
> 1. Make sure the firewall isn't the one at fault here. Best to disable
> firewall if you do not need it, or carefully configure the rules to
> allow in/out traffic over chosen ports.
> 2. Ensure that the hostnames fs.default.name and mapred.job.tracker
> bind to, are external IP-resolving hostnames and not localhost
> (loopback interface bound) addresses.
>
> On Tue, Jul 17, 2012 at 12:05 AM, Ronan Lehane <[EMAIL PROTECTED]>
> wrote:
> > Hi All,
> >
> > I was wondering if anyone could help me figure out what's going wrong in
> my
> > five node Hadoop cluster, please?
> >
> > It consists of:
> > 1. NameNode
> > hduser@namenode:/usr/local/hadoop$ jps
> > 13049 DataNode
> > 13387 Jps
> > 12740 NameNode
> > 13316 SecondaryNameNode
> >
> > 2. JobTracker
> > hduser@jobtracker:/usr/local/hadoop$ jps
> > 21817 TaskTracker
> > 21448 DataNode
> > 21542 JobTracker
> > 21862 Jps
> >
> > 3. Slave1
> > hduser@slave1:/usr/local/hadoop$ jps
> > 21226 DataNode
> > 21514 Jps
> > 21463 TaskTracker
> >
> > 4. Slave2
> > hduser@slave2:/usr/local/hadoop$ jps
> > 20938 Jps
> > 20650 DataNode
> > 20887 TaskTracker
> >
> > 5. Slave3
> > hduser@slave3:/usr/local/hadoop$ jps
> > 22145 Jps
> > 21854 DataNode
> > 22091 TaskTracker
> >
> > All DataNodes have been kicked off by running start-dfs.sh on the
> NameNode
> > All TaskTrackers have been kicked off by running start-mapred.sh on the
> > JobTracker
> >
> > When I try to execute a simple wordcount job from the NameNode I receive
> > the following error:
> > 12/07/16 19:25:22 ERROR security.UserGroupInformation:
> > PriviledgedActionException as:hduser cause:java.net.ConnectException:
> Call
> > to jobtracker/10.21.68.218:54311 failed on connection exception:
> > java.net.ConnectException: Connection refused
> >
> > If I check the jobtracker:
> > 1. I can ping in both directions by both IP and Hostname
> > 2. I can see that the jobtracker is listening on port 54311
> > tcp        0      0 127.0.0.1:54311         0.0.0.0:*
> > LISTEN      1001       425093      21542/java
> > 3. Telnet to this port from the NameNode fails with "Connection Refused"
> > telnet: Unable to connect to remote host: Connection refused
> >
> > This issue can be worked around by moving the JobTracker functionality to
> > the NameNode, but when this is done the job is executed on the NameNode
> > rather than distributed across the cluster.
> > Checking the log files on the slaves nodes, I see Server Not Available
> > messages referenced at the below wiki.
> > http://wiki.apache.org/hadoop/ServerNotAvailable
> > The Data Nodes not seeing the NameNode and the Task Trackers not seeing
> > JobTracker.
> > Checking the JobTracker web interface, it always states there is only 1
> > node available.
> >
> > I've checked the 5 troubleshooting steps provided but it all looks to be
> ok
> > in my environment.
> >
> > Would anyone have any idea's of what could be causing this?
> > Any help would be appreciated.
> >
> > Cheers,
> > Ronan
>
>
>
> --
> Harsh J
>