Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - cluster set-up / a few quick questions


Copy link to this message
-
Re: cluster set-up / a few quick questions - SOLVED
Nitin Pawar 2012-10-27, 05:40
Hi Andy,

you should definitely give a try to whirr for hadoop on aws. It solves
all issues and works smoothly.

Thanks,
nitin

On Sat, Oct 27, 2012 at 1:25 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
> Hadoopers,
>
> The problem was in EC2 security.  While I could passwordlessly ssh into another node and back I could not telnet to it due to EC2 firewall.  Needed to open ports for the NN and JT.  :)
>
> Now I can see 2  DNs running "hadoop fsck "  and can also -ls into NN from the slave. Sweet!!!
>
> Is this possible to balance data over DNs without copying them with  hadoop -put command? I read about bin/start-balancer.sh somewhere but cannot find it on my current hadoop installation.
> Besides, is balancing data over DN going to improve perfomance of MR job?
>
> Cheers,
> Happy Hadooping.
>
> -----Original Message-----
> From: Nitin Pawar [mailto:[EMAIL PROTECTED]]
> Sent: Friday, October 26, 2012 3:18 PM
> To: [EMAIL PROTECTED]
> Subject: Re: cluster set-up / a few quick questions
>
> questions
>
> 1) Have you setup password less ssh between both hosts for the user who owns the hadoop processes (or root)
> 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT
> 3) If you started them one by one, there is no reason running a command on one node will execute it on other.
>
>
> On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
>> Andy, many thanks.
>>
>> I am stuck here now so please put me in the right direction.
>>
>> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and are now trying to try fully-dist'ed one.
>>
>> a. I created another instance foo2 on EC2. Installed hadoop on it and copied conf/  folder from foo1 to foo2. I created  /hadoop/dfs/data folder on the local linux system on foo2.
>>
>> b. on foo1 I created file conf/slaves and added:
>> localhost
>> <hostname-of-foo2>
>>
>> At this point I cannot find an answer on what to do next.
>>
>> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar -files -blocks -locations", it showed # of datanode as 1.  I was expecting DN and TT on foo2 to be started by foo1. But it didn't happen, so I started them myself and tried the the command again. Still  one DD.
>> I realise that boo2 has no data at this point but I could not find bin/start-balancer.sh script to help me to balance data over to DD from foo1 to foo2.
>>
>> What do I do next?
>>
>> Thanks
>> AK
>>
>> -----Original Message-----
>> From: Andy Isaacson [mailto:[EMAIL PROTECTED]]
>> Sent: Friday, October 26, 2012 2:21 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: cluster set-up / a few quick questions
>>
>> On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
>>> Gents,
>>
>> We're not all male here. :)  I prefer "Hadoopers" or "hi all,".
>>
>>> 1.
>>> - do you put Master's node <hostname> under fs.default.name in core-site.xml on the slave machines or slaves' hostnames?
>>
>> Master.  I have a 4-node cluster, named foo1 - foo4. My fs.default.name is hdfs://foo1.domain.com.
>>
>>> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp /var folders on the HDFS of the slave machines that will be running only DN and TT or not? Do you still need to create hadoop/dfs/name folder on the slaves?
>>
>> (The following is the simple answer, for non-HA non-federated HDFS.
>> You'll want to get the simple example working before trying the
>> complicated ones.)
>>
>> No. A cluster has one namenode, running on the machine known as the master, and the admin must "hadoop namenode -format" on that machine only.
>>
>> In my example, I ran "hadoop namenode -format" on foo1.
>>
>>> 2.
>>> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify  /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by running command "mkdir -p /hadoop/dfs/data"
>>> but mapred.system.dir  property is to point to HDFS and not NFS  since we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??

Nitin Pawar