Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Hadoop Cluster setup on EC2 instances [Ubuntu 12.04 x64 based machines]


+
Harsh J 2012-12-02, 16:42
Copy link to this message
-
Hadoop Cluster setup on EC2 instances [Ubuntu 12.04 x64 based machines]


Hi All, I'm trying to setup Hadoop Cluster using 4 machines[4 x Ubuntu 12.04 x_64]. Using the following doc:                  1. http://titan.softnet.tuc.gr:8082/User:xenia/Page_Title/Hadoop_Cluster_Setup_Tutorial
I'm able to setup hadoop clusters with required configurations. I can see that all the required services on master and on slaves nodes are running as required[please see below JPS command output ]. The problem, I'm facing is that, the HDFS and Mapreduce daemons running on Master and can be accessed from Master only, and not from the slave machines. Note that, I've added these ports in the EC2 security group to open them. And I can browse the master machines UI from web browser, using: http://<machine ip>:50070/dfshealth.jsp

Now, the problem which I'm facing is , the HDFS as well the jobtracker both are accessible from the master machine[I'm using master as both Namenode and Datanode] but both the ports[hdfs: 54310 and mapreduce: 54320] used for these two are not accessible from other slave nodes.
I did: netstat -puntl on master machine and got this:
hadoop@nutchcluster1:~/hadoop$ netstat -puntl(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)Active Internet connections (only servers)Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program nametcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -               tcp6       0      0 :::50020                :::*                    LISTEN      6224/java       tcp6       0      0 127.0.0.1:54310         :::*                    LISTEN      6040/java       tcp6       0      0 127.0.0.1:32776         :::*                    LISTEN      6723/java       tcp6       0      0 :::57065                :::*                    LISTEN      6040/java       tcp6       0      0 :::50090                :::*                    LISTEN      6401/java       tcp6       0      0 :::50060                :::*                    LISTEN      6723/java       tcp6       0      0 :::50030                :::*                    LISTEN      6540/java       tcp6       0      0 127.0.0.1:54320         :::*                    LISTEN      6540/java       tcp6       0      0 :::45747                :::*                    LISTEN      6401/java       tcp6       0      0 :::33174                :::*                    LISTEN      6540/java       tcp6       0      0 :::50070                :::*                    LISTEN      6040/java       tcp6       0      0 :::22                   :::*                    LISTEN      -               tcp6       0      0 :::54424                :::*                    LISTEN      6224/java       tcp6       0      0 :::50010                :::*                    LISTEN      6224/java       tcp6       0      0 :::50075                :::*                    LISTEN      6224/java       udp        0      0 0.0.0.0:68              0.0.0.0:*                           -               hadoop@nutchcluster1:~/hadoop$

As can be seen in the output, both the HDFS daemon and mapreduce daemons are accessible, but only from 127.0.0.1 and not from 0.0.0.0 [any machine/slave machines]tcp6       0      0 127.0.0.1:54310         :::*                    LISTEN      6040/java tcp6       0      0 127.0.0.1:54320         :::*                    LISTEN      6540/java

To confirm, the same Idid this on master: adoop@nutchcluster1:~/hadoop$ bin/hadoop fs -ls hdfs://nutchcluster1:54310/Found 1 itemsdrwxr-xr-x   - hadoop supergroup          0 2012-12-02 12:53 /homehadoop@nutchcluster1:~/hadoop$
But, when I ran the same command on slaves, I get this: hadoop@nutchcluster2:~/hadoop$ bin/hadoop fs -ls hdfs://nutchcluster1:54310/12/12/02 15:42:16 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 0 time(s).12/12/02 15:42:17 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 1 time(s).12/12/02 15:42:18 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 2 time(s).12/12/02 15:42:19 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 3 time(s).12/12/02 15:42:20 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 4 time(s).12/12/02 15:42:21 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 5 time(s).12/12/02 15:42:22 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 6 time(s).12/12/02 15:42:23 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 7 time(s).12/12/02 15:42:24 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 8 time(s).12/12/02 15:42:25 INFO ipc.Client: Retrying connect to server: nutchcluster1/10.4.39.23:54310. Already tried 9 time(s).Bad connection to FS. command aborted. exception: Call to nutchcluster1/10.4.39.23:54310 failed on connection exception: java.net.ConnectException: Connection refusedhadoop@nutchcluster2:~/hadoop$
The configurations are as below:
<property>  <name>fs.default.name</name>  <value>hdfs://nutchcluster1:54310</value>  <description>The name of the default file system.  A URI whose  scheme and authority determine the FileSystem implementation.  The  uri's scheme determines the config property (fs.SCHEME.impl) naming  the FileSystem implementation class.  The uri's authority is used to  determine the host, port, etc. for a filesystem.</description></property>

<property>  <name>mapred.reduce.tasks</name>  <value>40</value>  <description>As a rule of thumb, use 2x the number of slave processors (i.e., number of tasktrackers).  </description></property></configuration>

I replicated all the above on all the other 3 slave machines[1 master + 3 slaves]. My /etc/hosts content is as below on the master node. Note that, I've thee same content on sl
+
A Geek 2012-12-03, 03:41