Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> cluster set-up / a few quick questions


Copy link to this message
-
Re: cluster set-up / a few quick questions
On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy <[EMAIL PROTECTED]> wrote:
> Gents,

We're not all male here. :)  I prefer "Hadoopers" or "hi all,".

> 1.
> - do you put Master's node <hostname> under fs.default.name in core-site.xml on the slave machines or slaves' hostnames?

Master.  I have a 4-node cluster, named foo1 - foo4. My
fs.default.name is hdfs://foo1.domain.com.

> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /tmp /var folders on the HDFS of the slave machines that will be running only DN and TT or not? Do you still need to create hadoop/dfs/name folder on the slaves?

(The following is the simple answer, for non-HA non-federated HDFS.
You'll want to get the simple example working before trying the
complicated ones.)

No. A cluster has one namenode, running on the machine known as the
master, and the admin must "hadoop namenode -format" on that machine
only.

In my example, I ran "hadoop namenode -format" on foo1.

> 2.
> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties  we specify  /hadoop/dfs/name /hadoop/dfs/data  being  local linux NFS directories by running command "mkdir -p /hadoop/dfs/data"
> but mapred.system.dir  property is to point to HDFS and not NFS  since we are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"??
> If so and since it is exactly the same format  /far/boo/baz how does hadoop know which directory is local on NFS or HDFS?

This is very confusing, to be sure!  There are a few places where
paths are implicitly known to be on HDFS rather than a Linux
filesystem path. mapred.system.dir is one of those. This does mean
that given a string that starts with "/tmp/" you can't necessarily
know whether it's a Linux path or a HDFS path without looking at the
larger context.

In the case of mapred.system.dir, the docs are the place to check;
according to cluster_setup.html, mapred.system.dir is "Path on the
HDFS where where the Map/Reduce framework stores system files".

http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html

Hope this helps,
-andy
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB