Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - DataNode and Tasttracker communication


+
Björn-Elmar Macek 2012-08-13, 12:31
+
Mohammad Tariq 2012-08-13, 12:51
+
Björn-Elmar Macek 2012-08-13, 13:08
+
Michael Segel 2012-08-13, 12:59
+
Björn-Elmar Macek 2012-08-13, 13:17
+
Michael Segel 2012-08-13, 13:36
Copy link to this message
-
Re: DataNode and Tasttracker communication
Mohammad Tariq 2012-08-13, 14:12
Hi Michael,
       I asked for hosts file because there seems to be some loopback prob
to me. The log shows that call is going at 0.0.0.0. Apart from what you
have said, I think disabling IPv6 and making sure that there is no prob
with the DNS resolution is also necessary. Please correct me if I am wrong.
Thank you.

Regards,
    Mohammad Tariq

On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Based on your /etc/hosts output, why aren't you using DNS?
>
> Outside of MapR, multihomed machines can be problematic. Hadoop doesn't
> generally work well when you're not using the FQDN or its alias.
>
> The issue isn't the SSH, but if you go to the node which is having trouble
> connecting to another node,  then try to ping it, or some other general
> communication,  if it succeeds, your issue is that the port you're trying
> to communicate with is blocked.  Then its more than likely an ipconfig or
> firewall issue.
>
> On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <[EMAIL PROTECTED]>
> wrote:
>
>  Hi Michael,
>
> well i can ssh from any node to any other without being prompted. The
> reason for this is, that my home dir is mounted in every server in the
> cluster.
>
> If the machines are multihomed: i dont know. i could ask if this would be
> of importance.
>
> Shall i?
>
> Regards,
> Elmar
>
> Am 13.08.12 14:59, schrieb Michael Segel:
>
> If the nodes can communicate and distribute data, then the odds are that
> the issue isn't going to be in his /etc/hosts.
>
>  A more relevant question is if he's running a firewall on each of these
> machines?
>
>  A simple test... ssh to one node, ping other nodes and the control nodes
> at random to see if they can see one another. Then check to see if there is
> a firewall running which would limit the types of traffic between nodes.
>
>  One other side note... are these machines multi-homed?
>
>   On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>
> Hello there,
>
>       Could you please share your /etc/hosts file, if you don't mind.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <[EMAIL PROTECTED]
> > wrote:
>
>> Hi,
>>
>> i am currently trying to run my hadoop program on a cluster. Sadly though
>> my datanodes and tasktrackers seem to have difficulties with their
>> communication as their logs say:
>> * Some datanodes and tasktrackers seem to have portproblems of some kind
>> as it can be seen in the logs below. I wondered if this might be due to
>> reasons correllated with the localhost entry in /etc/hosts as you can read
>> in alot of posts with similar errors, but i checked the file neither
>> localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping
>> localhost... the technician of the cluster said he'd be looking for the
>> mechanics resolving localhost)
>> * The other nodes can not speak with the namenode and jobtracker
>> (its-cs131). Although it is absolutely not clear, why this is happening:
>> the "dfs -put" i do directly before the job is running fine, which seems to
>> imply that communication between those servers is working flawlessly.
>>
>> Is there any reason why this might happen?
>>
>>
>> Regards,
>> Elmar
>>
>> LOGS BELOW:
>>
>> \____Datanodes
>>
>> After successfully putting the data to hdfs (at this point i thought
>> namenode and datanodes have to communicate), i get the following errors
>> when starting the job:
>>
>> There are 2 kinds of logs i found: the first one is big (about 12MB) and
>> looks like this:
>> ############################### LOG TYPE 1
>> ############################################################
>> 2012-08-13 08:23:27,331 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 0
>> time(s).
>> 2012-08-13 08:23:28,332 INFO org.apache.hadoop.ipc.Client: Retrying
>> connect to server: its-cs131/141.51.205.41:35554. Already tried 1
>> time(s).
>> 2012-08-13 08:23:29,332 INFO org.apache.hadoop.ipc.Client: Retrying
+
Björn-Elmar Macek 2012-08-13, 14:57
+
James Brown 2012-08-14, 06:51
+
Sriram Ramachandrasekaran... 2012-08-13, 16:37
+
Michael Segel 2012-08-13, 20:39
+
Björn-Elmar Macek 2012-08-16, 13:17
+
Björn-Elmar Macek 2012-08-20, 10:15