Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - DataNode and Tasttracker communication


Copy link to this message
-
Re: DataNode and Tasttracker communication
Michael Segel 2012-08-13, 13:36
Based on your /etc/hosts output, why aren't you using DNS?

Outside of MapR, multihomed machines can be problematic. Hadoop doesn't generally work well when you're not using the FQDN or its alias.

The issue isn't the SSH, but if you go to the node which is having trouble connecting to another node,  then try to ping it, or some other general communication,  if it succeeds, your issue is that the port you're trying to communicate with is blocked.  Then its more than likely an ipconfig or firewall issue.

On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <[EMAIL PROTECTED]> wrote:

> Hi Michael,
>
> well i can ssh from any node to any other without being prompted. The reason for this is, that my home dir is mounted in every server in the cluster.
>
> If the machines are multihomed: i dont know. i could ask if this would be of importance.
>
> Shall i?
>
> Regards,
> Elmar
>
> Am 13.08.12 14:59, schrieb Michael Segel:
>> If the nodes can communicate and distribute data, then the odds are that the issue isn't going to be in his /etc/hosts.
>>
>> A more relevant question is if he's running a firewall on each of these machines?
>>
>> A simple test... ssh to one node, ping other nodes and the control nodes at random to see if they can see one another. Then check to see if there is a firewall running which would limit the types of traffic between nodes.
>>
>> One other side note... are these machines multi-homed?
>>
>> On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>
>>> Hello there,
>>>
>>>      Could you please share your /etc/hosts file, if you don't mind.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>> On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> i am currently trying to run my hadoop program on a cluster. Sadly though my datanodes and tasktrackers seem to have difficulties with their communication as their logs say:
>>> * Some datanodes and tasktrackers seem to have portproblems of some kind as it can be seen in the logs below. I wondered if this might be due to reasons correllated with the localhost entry in /etc/hosts as you can read in alot of posts with similar errors, but i checked the file neither localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping localhost... the technician of the cluster said he'd be looking for the mechanics resolving localhost)
>>> * The other nodes can not speak with the namenode and jobtracker (its-cs131). Although it is absolutely not clear, why this is happening: the "dfs -put" i do directly before the job is running fine, which seems to imply that communication between those servers is working flawlessly.
>>>
>>> Is there any reason why this might happen?
>>>
>>>
>>> Regards,
>>> Elmar
>>>
>>> LOGS BELOW:
>>>
>>> \____Datanodes
>>>
>>> After successfully putting the data to hdfs (at this point i thought namenode and datanodes have to communicate), i get the following errors when starting the job:
>>>
>>> There are 2 kinds of logs i found: the first one is big (about 12MB) and looks like this:
>>> ############################### LOG TYPE 1 ############################################################
>>> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
>>> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 1 time(s).
>>> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 2 time(s).
>>> 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 3 time(s).
>>> 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 4 time(s).
>>> 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 5 time(s).