you should definitely give a try to whirr for hadoop on aws. It solves
all issues and works smoothly.
On Sat, Oct 27, 2012 at 1:25 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
> The problem was in EC2 security. While I could passwordlessly ssh into another node and back I could not telnet to it due to EC2 firewall. Needed to open ports for the NN and JT. :)
> Now I can see 2 DNs running "hadoop fsck " and can also -ls into NN from the slave. Sweet!!!
> Is this possible to balance data over DNs without copying them with hadoop -put command? I read about bin/start-balancer.sh somewhere but cannot find it on my current hadoop installation.
> Besides, is balancing data over DN going to improve perfomance of MR job?
> Happy Hadooping.
> -----Original Message-----
> From: Nitin Pawar [mailto:[EMAIL PROTECTED]]
> Sent: Friday, October 26, 2012 3:18 PM
> To: [EMAIL PROTECTED]
> Subject: Re: cluster set-up / a few quick questions
> 1) Have you setup password less ssh between both hosts for the user who owns the hadoop processes (or root)
> 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
> 3) If you started them one by one, there is no reason running a command on one node will execute it on other.
> On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
>> Andy, many thanks.
>> I am stuck here now so please put me in the right direction.
>> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and are now trying to try fully-dist'ed one.
>> a. I created another instance foo2 on EC2. Installed hadoop on it and copied conf/ folder from foo1 to foo2. I created /hadoop/dfs/data folder on the local linux system on foo2.
>> b. on foo1 I created file conf/slaves and added:
>> At this point I cannot find an answer on what to do next.
>> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar -files -blocks -locations", it showed # of datanode as 1. I was expecting DN and TT on foo2 to be started by foo1. But it didn't happen, so I started them myself and tried the the command again. Still one DD.
>> I realise that boo2 has no data at this point but I could not find bin/start-balancer.sh script to help me to balance data over to DD from foo1 to foo2.
>> What do I do next?
>> -----Original Message-----
>> From: Andy Isaacson [mailto:[EMAIL PROTECTED]]
>> Sent: Friday, October 26, 2012 2:21 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: cluster set-up / a few quick questions
>> On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
>> We're not all male here. :) I prefer "Hadoopers" or "hi all,".
>>> - do you put Master's node <hostname> under fs.default.name in core-site.xml on the slave machines or slaves' hostnames?
>> Master. I have a 4-node cluster, named foo1 - foo4. My fs.default.name is hdfs://foo1.domain.com.
>>> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp /var folders on the HDFS of the slave machines that will be running only DN and TT or not? Do you still need to create hadoop/dfs/name folder on the slaves?
>> (The following is the simple answer, for non-HA non-federated HDFS.
>> You'll want to get the simple example working before trying the
>> complicated ones.)
>> No. A cluster has one namenode, running on the machine known as the master, and the admin must "hadoop namenode -format" on that machine only.
>> In my example, I ran "hadoop namenode -format" on foo1.
>>> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties we specify /hadoop/dfs/name /hadoop/dfs/data being local linux NFS directories by running command "mkdir -p /hadoop/dfs/data"
>>> but mapred.system.dir property is to point to HDFS and not NFS since we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??