Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> DataNode and Tasttracker communication


Copy link to this message
-
Re: DataNode and Tasttracker communication
If the nodes can communicate and distribute data, then the odds are that the issue isn't going to be in his /etc/hosts.

A more relevant question is if he's running a firewall on each of these machines?

A simple test... ssh to one node, ping other nodes and the control nodes at random to see if they can see one another. Then check to see if there is a firewall running which would limit the types of traffic between nodes.

One other side note... are these machines multi-homed?

On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello there,
>
>      Could you please share your /etc/hosts file, if you don't mind.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <[EMAIL PROTECTED]> wrote:
> Hi,
>
> i am currently trying to run my hadoop program on a cluster. Sadly though my datanodes and tasktrackers seem to have difficulties with their communication as their logs say:
> * Some datanodes and tasktrackers seem to have portproblems of some kind as it can be seen in the logs below. I wondered if this might be due to reasons correllated with the localhost entry in /etc/hosts as you can read in alot of posts with similar errors, but i checked the file neither localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping localhost... the technician of the cluster said he'd be looking for the mechanics resolving localhost)
> * The other nodes can not speak with the namenode and jobtracker (its-cs131). Although it is absolutely not clear, why this is happening: the "dfs -put" i do directly before the job is running fine, which seems to imply that communication between those servers is working flawlessly.
>
> Is there any reason why this might happen?
>
>
> Regards,
> Elmar
>
> LOGS BELOW:
>
> \____Datanodes
>
> After successfully putting the data to hdfs (at this point i thought namenode and datanodes have to communicate), i get the following errors when starting the job:
>
> There are 2 kinds of logs i found: the first one is big (about 12MB) and looks like this:
> ############################### LOG TYPE 1 ############################################################
> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 0 time(s).
> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 1 time(s).
> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 2 time(s).
> 2012-08-13 08:23:30,332 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 3 time(s).
> 2012-08-13 08:23:31,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 4 time(s).
> 2012-08-13 08:23:32,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 5 time(s).
> 2012-08-13 08:23:33,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 6 time(s).
> 2012-08-13 08:23:34,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 7 time(s).
> 2012-08-13 08:23:35,334 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 8 time(s).
> 2012-08-13 08:23:36,335 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: its-cs131/141.51.205.41:35554. Already tried 9 time(s).
> 2012-08-13 08:23:36,335 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.ConnectException: Call to its-cs131/141.51.205.41:35554 failed on connection exception: java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)