-Re: Data Nodes not seeing NameNode / Task Trackers not seeing JobTracker
Ronan Lehane 2012-07-17, 19:28
Just in case anyone else was seeing a similar problem, this issue was
resolved by removing the loopback addresses from the /ets/hosts files.
Seems to a problem on Ubuntu.
On Mon, Jul 16, 2012 at 9:22 PM, Ronan Lehane <[EMAIL PROTECTED]>wrote:
> Thanks for the quick reply Harsh.
> I think you may be onto something with the second suggestion.
> I found an earlier thread saying that some of the troubleshooting steps
> outlined below resolved a similar issue for that person:
> Like you suggested, the /etc/hosts file definitely looks to be involved as
> I hit different issues depending on what hostnames are set against the
> loopback addresses.
> I'll try reset them to see if it resolves the issue.
> On Mon, Jul 16, 2012 at 7:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> A couple of simple things to ensure first:
>> 1. Make sure the firewall isn't the one at fault here. Best to disable
>> firewall if you do not need it, or carefully configure the rules to
>> allow in/out traffic over chosen ports.
>> 2. Ensure that the hostnames fs.default.name and mapred.job.tracker
>> bind to, are external IP-resolving hostnames and not localhost
>> (loopback interface bound) addresses.
>> On Tue, Jul 17, 2012 at 12:05 AM, Ronan Lehane <[EMAIL PROTECTED]>
>> > Hi All,
>> > I was wondering if anyone could help me figure out what's going wrong
>> in my
>> > five node Hadoop cluster, please?
>> > It consists of:
>> > 1. NameNode
>> > hduser@namenode:/usr/local/hadoop$ jps
>> > 13049 DataNode
>> > 13387 Jps
>> > 12740 NameNode
>> > 13316 SecondaryNameNode
>> > 2. JobTracker
>> > hduser@jobtracker:/usr/local/hadoop$ jps
>> > 21817 TaskTracker
>> > 21448 DataNode
>> > 21542 JobTracker
>> > 21862 Jps
>> > 3. Slave1
>> > hduser@slave1:/usr/local/hadoop$ jps
>> > 21226 DataNode
>> > 21514 Jps
>> > 21463 TaskTracker
>> > 4. Slave2
>> > hduser@slave2:/usr/local/hadoop$ jps
>> > 20938 Jps
>> > 20650 DataNode
>> > 20887 TaskTracker
>> > 5. Slave3
>> > hduser@slave3:/usr/local/hadoop$ jps
>> > 22145 Jps
>> > 21854 DataNode
>> > 22091 TaskTracker
>> > All DataNodes have been kicked off by running start-dfs.sh on the
>> > All TaskTrackers have been kicked off by running start-mapred.sh on the
>> > JobTracker
>> > When I try to execute a simple wordcount job from the NameNode I receive
>> > the following error:
>> > 12/07/16 19:25:22 ERROR security.UserGroupInformation:
>> > PriviledgedActionException as:hduser cause:java.net.ConnectException:
>> > to jobtracker/10.21.68.218:54311 failed on connection exception:
>> > java.net.ConnectException: Connection refused
>> > If I check the jobtracker:
>> > 1. I can ping in both directions by both IP and Hostname
>> > 2. I can see that the jobtracker is listening on port 54311
>> > tcp 0 0 127.0.0.1:54311 0.0.0.0:*
>> > LISTEN 1001 425093 21542/java
>> > 3. Telnet to this port from the NameNode fails with "Connection Refused"
>> > telnet: Unable to connect to remote host: Connection refused
>> > This issue can be worked around by moving the JobTracker functionality
>> > the NameNode, but when this is done the job is executed on the NameNode
>> > rather than distributed across the cluster.
>> > Checking the log files on the slaves nodes, I see Server Not Available
>> > messages referenced at the below wiki.
>> > http://wiki.apache.org/hadoop/ServerNotAvailable
>> > The Data Nodes not seeing the NameNode and the Task Trackers not seeing
>> > JobTracker.
>> > Checking the JobTracker web interface, it always states there is only 1
>> > node available.
>> > I've checked the 5 troubleshooting steps provided but it all looks to
>> be ok
>> > in my environment.
>> > Would anyone have any idea's of what could be causing this?
>> > Any help would be appreciated.
>> > Cheers,