-Re: cluster set-up / a few quick questions
Andy Isaacson 2012-10-26, 21:32
On Fri, Oct 26, 2012 at 11:47 AM, Kartashov, Andy
<[EMAIL PROTECTED]> wrote:
> I successfully ran a job on a cluster on foo1 in pseudo-distributed mode and are now trying to try fully-dist'ed one.
> a. I created another instance foo2 on EC2.
It seems like you're trying to use the start-dfs.sh style startup
scripts to manually run a cluster on EC2. This is doable, but it's
not very easy due to the mismatch in expectations between EC2 style
deployments and start-dfs.sh. Setting up a manually started cluster
requires a bit of up-front work, and EC2 spin-up/spin-down cycles mean
you end up redoing that work frequently.
You might consider using whirr, http://whirr.apache.org/ as a more
automated way of deploying Hadoop clusters on EC2.
Of course, setting up a manual cluster can be a really good way to
understand how all the parts work together, and doing it on EC2 should
work just fine.
> Installed hadoop on it and copied conf/ folder from foo1 to foo2. I created /hadoop/dfs/data folder on the local linux system on foo2.
> b. on foo1 I created file conf/slaves and added:
I'd strongly recommend being consistent with the naming, don't mix
"localhost" and DNS names. EC2 has "ec2.internal" in /etc/resolv.conf
by default, so you can "ping ip-10-42-120-3" and it should work just
fine. Then make conf/master list your first host by name, and make
conf/slaves list all your hosts by name. Note that for small clusters,
running a DN and a NN on a single host is an acceptable compromise and
% cat conf/master
% cat conf/slaves
You also should make sure that your user account can ssh to all the nodes:
% for h in $(cat conf/slaves); do ssh -oStrictHostKeyChecking=no $h
- answer "yes" to any "allow untrusted certificate" messages
- if you get "permission denied" messages you'll need to set up the
- after this loop succeeds you should be able to run it again and get
a clean list of hostnames.
> At this point I cannot find an answer on what to do next.
> I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/bar -files -blocks -locations", it showed # of datanode as 1. I was expecting DN and TT on foo2 to be started by foo1. But it didn’t happen, so I started them myself and tried the the command again. Still one DD.
You don't need to start the daemons individually, and doing so is very
difficult to get right. I virtually never do so -- I use the
start-dfs.sh script to start the daemons (NN, DN, TT, etc). The
"master" and "slaves" config files are parsed by the start-*.sh
scripts, not by the daemons themselves. And, the daemons don't start
themselves -- for a manual cluster, the start-*.sh scripts are
responsible. (In a production deployment such as CDH, there is a
/etc/init.d script which is managed by the distro packaging to start
and manage the daemons.)