-RE: Hadoop Cluster setup on EC2 instances [Ubuntu 12.04 x64 based machines]
A Geek 2012-12-03, 03:41
Thanks Harsh. As per your comments, I removed the loopback address for the hostname and added the LAN IP, and I copied the same content on all the 3 slave machines and everything started working.
Thanks Nitin for pointing me to Whirr. I'd a quick look earlier at Whirr, but though it might be complex to setup the things and did everything manually. But now, it looks like Whirr is quite a useful tool. I'll take a look.
Thanks to the Hadoop community, my cluster is now up and running.
Date: Mon, 3 Dec 2012 00:18:48 +0530
Subject: Re: Hadoop Cluster setup on EC2 instances [Ubuntu 12.04 x64 based machines]
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
if you want to setup a hadoop cluster on aws, just try using whirr. Basically it does everything for you
On Sun, Dec 2, 2012 at 10:12 PM, Harsh J <[EMAIL PROTECTED]> wrote:
Your problem is that your /etc/hosts file has the line:
Just delete that line, restart your services. You intend your hostname
"nutchcluster1" to be externally accessible, so aliasing it to the
loopback address (127.0.0.1) is not right.
On Sun, Dec 2, 2012 at 10:08 PM, A Geek <[EMAIL PROTECTED]> wrote:
> Just to add the version details: I'm running Apache Hadoop release 1.0.4
> with jdk1.6.0_37 . The underlying Ubuntu 12.04 machine has got 300GB disk
> space and has 1.7GB RAM and is a single core machine.
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Hadoop Cluster setup on EC2 instances [Ubuntu 12.04 x64 based
> Date: Sun, 2 Dec 2012 15:55:09 +0000
> Hi All,
> I'm trying to setup Hadoop Cluster using 4 machines[4 x Ubuntu 12.04 x_64].
> Using the following doc:
> I'm able to setup hadoop clusters with required configurations. I can see
> that all the required services on master and on slaves nodes are running as
> required[please see below JPS command output ]. The problem, I'm facing is
> that, the HDFS and Mapreduce daemons running on Master and can be accessed
> from Master only, and not from the slave machines. Note that, I've added
> these ports in the EC2 security group to open them. And I can browse the
> master machines UI from web browser, using: http://<machine
> Now, the problem which I'm facing is , the HDFS as well the jobtracker both
> are accessible from the master machine[I'm using master as both Namenode and
> Datanode] but both the ports[hdfs: 54310 and mapreduce: 54320] used for
> these two are not accessible from other slave nodes.
> I did: netstat -puntl on master machine and got this:
> hadoop@nutchcluster1:~/hadoop$ netstat -puntl
> (Not all processes could be identified, non-owned process info
> will not be shown, you would have to be root to see it all.)
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> PID/Program name
> tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
> tcp6 0 0 :::50020 :::* LISTEN
> tcp6 0 0 127.0.0.1:54310 :::* LISTEN
> tcp6 0 0 127.0.0.1:32776 :::* LISTEN
> tcp6 0 0 :::57065 :::* LISTEN
> tcp6 0 0 :::50090 :::* LISTEN
> tcp6 0 0 :::50060 :::* LISTEN
> tcp6 0 0 :::50030 :::* LISTEN
> tcp6 0 0 127.0.0.1:54320 :::* LISTEN
> tcp6 0 0 :::45747 :::* LISTEN