|
|
-
Re: DataNode and Tasttracker communication
Michael Segel 2012-08-13, 14:59
0.0.0.0 means that the call is going to all interfaces on the machine. (Shouldn't be an issue...)
IPv4 vs IPv6? Could be an issue, however OP says he can write data to DNs and they seem to communicate, therefore if its IPv6 related, wouldn't it impact all traffic and not just a specific port? I agree... shut down IPv6 if you can.
I don't disagree with your assessment. I am just suggesting that before you do a really deep dive, you think about the more obvious stuff first.
There are a couple of other things... like do all of the /etc/hosts files on all of the machines match? Is the OP using both /etc/hosts and DNS? If so, are they in sync?
BTW, you said DNS in your response. if you're using DNS, then you don't really want to have much info in the /etc/hosts file except loopback and the server's IP address.
Looking at the problem OP is indicating some traffic works, while other traffic doesn't. Most likely something is blocking the ports. Iptables is the first place to look.
Just saying. ;-) On Aug 13, 2012, at 9:12 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
> Hi Michael, > I asked for hosts file because there seems to be some loopback prob to me. The log shows that call is going at 0.0.0.0. Apart from what you have said, I think disabling IPv6 and making sure that there is no prob with the DNS resolution is also necessary. Please correct me if I am wrong. Thank you. > > Regards, > Mohammad Tariq > > > > On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel <[EMAIL PROTECTED]> wrote: > Based on your /etc/hosts output, why aren't you using DNS? > > Outside of MapR, multihomed machines can be problematic. Hadoop doesn't generally work well when you're not using the FQDN or its alias. > > The issue isn't the SSH, but if you go to the node which is having trouble connecting to another node, then try to ping it, or some other general communication, if it succeeds, your issue is that the port you're trying to communicate with is blocked. Then its more than likely an ipconfig or firewall issue. > > On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <[EMAIL PROTECTED]> wrote: > >> Hi Michael, >> >> well i can ssh from any node to any other without being prompted. The reason for this is, that my home dir is mounted in every server in the cluster. >> >> If the machines are multihomed: i dont know. i could ask if this would be of importance. >> >> Shall i? >> >> Regards, >> Elmar >> >> Am 13.08.12 14:59, schrieb Michael Segel: >>> If the nodes can communicate and distribute data, then the odds are that the issue isn't going to be in his /etc/hosts. >>> >>> A more relevant question is if he's running a firewall on each of these machines? >>> >>> A simple test... ssh to one node, ping other nodes and the control nodes at random to see if they can see one another. Then check to see if there is a firewall running which would limit the types of traffic between nodes. >>> >>> One other side note... are these machines multi-homed? >>> >>> On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >>> >>>> Hello there, >>>> >>>> Could you please share your /etc/hosts file, if you don't mind. >>>> >>>> Regards, >>>> Mohammad Tariq >>>> >>>> >>>> >>>> On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek <[EMAIL PROTECTED]> wrote: >>>> Hi, >>>> >>>> i am currently trying to run my hadoop program on a cluster. Sadly though my datanodes and tasktrackers seem to have difficulties with their communication as their logs say: >>>> * Some datanodes and tasktrackers seem to have portproblems of some kind as it can be seen in the logs below. I wondered if this might be due to reasons correllated with the localhost entry in /etc/hosts as you can read in alot of posts with similar errors, but i checked the file neither localhost nor 127.0.0.1/127.0.1.1 is bound there. (although you can ping localhost... the technician of the cluster said he'd be looking for the mechanics resolving localhost)
-
Re: DataNode and Tasttracker communication
Mohammad Tariq 2012-08-13, 15:05
Thank you so very much for the detailed response Michael. I'll keep the tip in mind. Please pardon my ignorance, as I am still in the learning phase.
Regards, Mohammad Tariq
On Mon, Aug 13, 2012 at 8:29 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
> 0.0.0.0 means that the call is going to all interfaces on the machine. > (Shouldn't be an issue...) > > IPv4 vs IPv6? Could be an issue, however OP says he can write data to DNs > and they seem to communicate, therefore if its IPv6 related, wouldn't it > impact all traffic and not just a specific port? > I agree... shut down IPv6 if you can. > > I don't disagree with your assessment. I am just suggesting that before > you do a really deep dive, you think about the more obvious stuff first. > > There are a couple of other things... like do all of the /etc/hosts files > on all of the machines match? > Is the OP using both /etc/hosts and DNS? If so, are they in sync? > > BTW, you said DNS in your response. if you're using DNS, then you don't > really want to have much info in the /etc/hosts file except loopback and > the server's IP address. > > Looking at the problem OP is indicating some traffic works, while other > traffic doesn't. Most likely something is blocking the ports. Iptables is > the first place to look. > > Just saying. ;-) > > > On Aug 13, 2012, at 9:12 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > > Hi Michael, > I asked for hosts file because there seems to be some loopback prob > to me. The log shows that call is going at 0.0.0.0. Apart from what you > have said, I think disabling IPv6 and making sure that there is no prob > with the DNS resolution is also necessary. Please correct me if I am wrong. > Thank you. > > Regards, > Mohammad Tariq > > > > On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > >> Based on your /etc/hosts output, why aren't you using DNS? >> >> Outside of MapR, multihomed machines can be problematic. Hadoop doesn't >> generally work well when you're not using the FQDN or its alias. >> >> The issue isn't the SSH, but if you go to the node which is having >> trouble connecting to another node, then try to ping it, or some other >> general communication, if it succeeds, your issue is that the port you're >> trying to communicate with is blocked. Then its more than likely an >> ipconfig or firewall issue. >> >> On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek <[EMAIL PROTECTED]> >> wrote: >> >> Hi Michael, >> >> well i can ssh from any node to any other without being prompted. The >> reason for this is, that my home dir is mounted in every server in the >> cluster. >> >> If the machines are multihomed: i dont know. i could ask if this would be >> of importance. >> >> Shall i? >> >> Regards, >> Elmar >> >> Am 13.08.12 14:59, schrieb Michael Segel: >> >> If the nodes can communicate and distribute data, then the odds are that >> the issue isn't going to be in his /etc/hosts. >> >> A more relevant question is if he's running a firewall on each of these >> machines? >> >> A simple test... ssh to one node, ping other nodes and the control >> nodes at random to see if they can see one another. Then check to see if >> there is a firewall running which would limit the types of traffic between >> nodes. >> >> One other side note... are these machines multi-homed? >> >> On Aug 13, 2012, at 7:51 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >> >> Hello there, >> >> Could you please share your /etc/hosts file, if you don't mind. >> >> Regards, >> Mohammad Tariq >> >> >> >> On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek < >> [EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> i am currently trying to run my hadoop program on a cluster. Sadly >>> though my datanodes and tasktrackers seem to have difficulties with their >>> communication as their logs say: >>> * Some datanodes and tasktrackers seem to have portproblems of some kind >>> as it can be seen in the logs below. I wondered if this might be due to
-
Re: DataNode and Tasttracker communication
Björn-Elmar Macek 2012-08-14, 11:22
Hi James,
thank you for your reply!
i tried to, but i can only see my own processes, since i am no root user. :( I already sent out a request to the cluster admins to sort this out for me.
Regards, Bjᅵrn Am 14.08.2012 08:51, schrieb James Brown: > Hi Bjorn, > > For the two items below, it is possible datanodes and tasktrackers are > already running. > > This command will show processes bound to the datanode port: > netstat -putan | grep 50010 > > tasktracker port: > netstat -putan | grep 50060 > > If your netstat command does not support the -p option try lsof. > > >> \____Datanodes > ... >> The second is short kind: >> ########################### LOG TYPE 2 >> ############################################################ >> 2012-08-13 00:59:19,038 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: > ... >> 2012-08-13 00:59:21,898 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException: >> Problem binding to /0.0.0.0:50010 : Address already in use > > ... > >> \_____TastTracker > ... >> ########################### LOG TYPE 2 >> ############################################################ >> 2012-08-13 00:59:24,376 INFO org.apache.hadoop.mapred.TaskTracker: >> STARTUP_MSG: > ... >> 2012-08-13 00:59:38,161 ERROR org.apache.hadoop.mapred.TaskTracker: Can >> not start task tracker because java.net.BindException: Address already >> in use > > > > >
-
Re: DataNode and Tasttracker communication
Björn-Elmar Macek 2012-08-14, 11:25
Hi Michael and Mohammad,
thanks alot for your inpus! i have pinged the people at the cluster in order to (eventually disable IPv6) and definetly check the ports corresponding to the appropriate machines. I will keep you updated.
Regards, Elmar Am 13.08.2012 22:39, schrieb Michael Segel: > > The key is to think about what can go wrong, but start with the low > hanging fruit. > > I mean you could be right, however you're jumping the gun and are over > looking simpler issues. > > The most common issue is that the networking traffic is being filtered. > Of course since we're both diagnosing this with minimal information, > we're kind of shooting from the hip. > > This is why I'm asking if there is any networking traffic between the > nodes. If you have partial communication, then focus on why you can't > see the specific traffic. > > > On Aug 13, 2012, at 10:05 AM, Mohammad Tariq <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > >> Thank you so very much for the detailed response Michael. I'll keep >> the tip in mind. Please pardon my ignorance, as I am still in the >> learning phase. >> >> Regards, >> Mohammad Tariq >> >> >> >> On Mon, Aug 13, 2012 at 8:29 PM, Michael Segel >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >> >> 0.0.0.0 means that the call is going to all interfaces on the >> machine. (Shouldn't be an issue...) >> >> IPv4 vs IPv6? Could be an issue, however OP says he can write >> data to DNs and they seem to communicate, therefore if its IPv6 >> related, wouldn't it impact all traffic and not just a specific port? >> I agree... shut down IPv6 if you can. >> >> I don't disagree with your assessment. I am just suggesting that >> before you do a really deep dive, you think about the more >> obvious stuff first. >> >> There are a couple of other things... like do all of the >> /etc/hosts files on all of the machines match? >> Is the OP using both /etc/hosts and DNS? If so, are they in sync? >> >> BTW, you said DNS in your response. if you're using DNS, then you >> don't really want to have much info in the /etc/hosts file except >> loopback and the server's IP address. >> >> Looking at the problem OP is indicating some traffic works, while >> other traffic doesn't. Most likely something is blocking the >> ports. Iptables is the first place to look. >> >> Just saying. ;-) >> >> >> On Aug 13, 2012, at 9:12 AM, Mohammad Tariq <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >>> Hi Michael, >>> I asked for hosts file because there seems to be some >>> loopback prob to me. The log shows that call is going at >>> 0.0.0.0. Apart from what you have said, I think disabling IPv6 >>> and making sure that there is no prob with the DNS resolution is >>> also necessary. Please correct me if I am wrong. Thank you. >>> >>> Regards, >>> Mohammad Tariq >>> >>> >>> >>> On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel >>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> >>> wrote: >>> >>> Based on your /etc/hosts output, why aren't you using DNS? >>> >>> Outside of MapR, multihomed machines can be problematic. >>> Hadoop doesn't generally work well when you're not using the >>> FQDN or its alias. >>> >>> The issue isn't the SSH, but if you go to the node which is >>> having trouble connecting to another node, then try to ping >>> it, or some other general communication, if it succeeds, >>> your issue is that the port you're trying to communicate >>> with is blocked. Then its more than likely an ipconfig or >>> firewall issue. >>> >>> On Aug 13, 2012, at 8:17 AM, Bj�rn-Elmar Macek >>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: >>> >>>> Hi Michael, >>>> >>>> well i can ssh from any node to any other without being
|
|