Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> DataNode and Tasttracker communication


Copy link to this message
-
Re: DataNode and Tasttracker communication

The key is to think about what can go wrong, but start with the low hanging fruit.

I mean you could be right, however you're jumping the gun and are over looking simpler issues.

The most common issue is that the networking traffic is being filtered.
Of course since we're both diagnosing this with minimal information, we're kind of shooting from the hip.

This is why I'm asking if there is any networking traffic between the nodes.  If you have partial communication, then focus on why you can't see the specific traffic.
On Aug 13, 2012, at 10:05 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Thank you so very much for the detailed response Michael. I'll keep the tip in mind. Please pardon my ignorance, as I am still in the learning phase.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Aug 13, 2012 at 8:29 PM, Michael Segel <[EMAIL PROTECTED]> wrote:
> 0.0.0.0 means that the call is going to all interfaces on the machine.  (Shouldn't be an issue...)
>
> IPv4 vs IPv6? Could be an issue, however OP says he can write data to DNs and they seem to communicate, therefore if its IPv6 related, wouldn't it impact all traffic and not just a specific port?
> I agree... shut down IPv6 if you can.
>
> I don't disagree with your assessment. I am just suggesting that before you do a really deep dive, you think about the more obvious stuff first.
>
> There are a couple of other things... like do all of the /etc/hosts files on all of the machines match?
> Is the OP using both /etc/hosts and DNS? If so, are they in sync?
>
> BTW, you said DNS in your response. if you're using DNS, then you don't really want to have much info in the /etc/hosts file except loopback and the server's IP address.
>
> Looking at the problem OP is indicating some traffic works, while other traffic doesn't. Most likely something is blocking the ports. Iptables is the first place to look.
>
> Just saying. ;-)
>
>
> On Aug 13, 2012, at 9:12 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>
>> Hi Michael,
>>        I asked for hosts file because there seems to be some loopback prob to me. The log shows that call is going at 0.0.0.0. Apart from what you have said, I think disabling IPv6 and making sure that there is no prob with the DNS resolution is also necessary. Please correct me if I am wrong. Thank you.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel <[EMAIL PROTECTED]> wrote:
>> Based on your /etc/hosts output, why aren't you using DNS?
>>
>> Outside of MapR, multihomed machines can be problematic. Hadoop doesn't generally work well when you're not using the FQDN or its alias.
>>
>> The issue isn't the SSH, but if you go to the node which is having trouble connecting to another node,  then try to ping it, or some other general communication,  if it succeeds, your issue is that the port you're trying to communicate with is blocked.  Then its more than likely an ipconfig or firewall issue.
>>
>> On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Michael,
>>>
>>> well i can ssh from any node to any other without being prompted. The reason for this is, that my home dir is mounted in every server in the cluster.
>>>
>>> If the machines are multihomed: i dont know. i could ask if this would be of importance.
>>>
>>> Shall i?
>>>
>>> Regards,
>>> Elmar
>>>
>>> Am 13.08.12 14:59, schrieb Michael Segel:
>>>> If the nodes can communicate and distribute data, then the odds are that the issue isn't going to be in his /etc/hosts.
>>>>
>>>> A more relevant question is if he's running a firewall on each of these machines?
>>>>
>>>> A simple test... ssh to one node, ping other nodes and the control nodes at random to see if they can see one another. Then check to see if there is a firewall running which would limit the types of traffic between nodes.
>>>>
>>>> One other side note... are these machines multi-homed?
>>>>
>>>> On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: