Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - cluster set-up / a few quick questions


Copy link to this message
-
Re: cluster set-up / a few quick questions
Andy Isaacson 2012-10-26, 18:20
On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
> Gents,

We're not all male here. :)  I prefer "Hadoopers" or "hi all,".

> 1.
> - do you put Master's node <hostname> under fs.default.name in core-site.xml on the slave machines or slaves' hostnames?

Master.  I have a 4-node cluster, named foo1 - foo4. My
fs.default.name is hdfs://foo1.domain.com.

> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp /var folders on the HDFS of the slave machines that will be running only DN and TT or not? Do you still need to create hadoop/dfs/name folder on the slaves?

(The following is the simple answer, for non-HA non-federated HDFS.
You'll want to get the simple example working before trying the
complicated ones.)

No. A cluster has one namenode, running on the machine known as the
master, and the admin must "hadoop namenode -format" on that machine
only.

In my example, I ran "hadoop namenode -format" on foo1.

> 2.
> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify  /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by running command "mkdir -p /hadoop/dfs/data"
> but mapred.system.dir  property is to point to HDFS and not NFS  since we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
> If so and since it is exactly the same format  /far/boo/baz how does hadoop know which directory is local on NFS or HDFS?

This is very confusing, to be sure!  There are a few places where
paths are implicitly known to be on HDFS rather than a Linux
filesystem path. mapred.system.dir is one of those. This does mean
that given a string that starts with "/tmp/" you can't necessarily
know whether it's a Linux path or a HDFS path without looking at the
larger context.

In the case of mapred.system.dir, the docs are the place to check;
according to cluster_setup.html, mapred.system.dir is "Path on the
HDFS where where the Map/Reduce framework stores system files".

http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html

Hope this helps,
-andy