Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Hadoop - cluster set-up (for DUMMIES)...  or how I did it


+
Kartashov, Andy 2012-11-02, 16:35
Copy link to this message
-
Re: Hadoop - cluster set-up (for DUMMIES)... or how I did it
Hello Andy,

        Thank you  for sharing your experience with us. I would just like
to add that it is always good to include "dfs.name.dir" and "dfs.data.dir"
properties in hdfs-site.xml file to make sure that everything runs smoothly
as /tmp gets emptied at each restart. So, there are always chances of
loosing the data and meta info. Also, t's good to add "hadoop.tmp.dir" in
core-site.xml as it also default to /tmp.

Regards,
    Mohammad Tariq

On Fri, Nov 2, 2012 at 10:05 PM, Kartashov, Andy <[EMAIL PROTECTED]>wrote:

> Hello Hadoopers,
>
> After weeks of struggle, numerous error debugging and the like I finally
> managed to set-up a fully distributed cluster. I decided to share my
> experience with the new comers.
>  In case the experts on here disagree with some of the facts mentioned
> here-in feel free to correct or add your comments.
>
> Example Cluster Topology:
> Node 1 – NameNode+JobTracker
> Node 2 – SecondaryNameNode
> Node 3, 4, .., N – DataNodes 1,2,..N+TaskTrackers 1,2,..N
>
> Configuration set-up after you installed Hadoop:
>
> Firstly, you will need to find every host address of your respective Node
> by running:
> $hostname –f
>
> Your /etc/hadoop/ folder contains subfolders of your configuration files.
>  Your installation will create a default folder conf.empty. Copy it to, say
> conf.cluster and make sure your soft link conf-> points to conf.cluster
>
> You can see what it points now to by running:
> $ alternatives --display hadoop-conf
>
> Make a new link and set it to point to conf.cluster:
> $ sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf
> /etc/hadoop/conf.cluster 50
> $ sudo alternatives --set hadoop-conf /etc/hadoop/conf.cluster
> Run the display again to check proper configuration
> $ alternatives --display hadoop-conf
>
> Let’s go inside conf.cluster
> $cd conf.cluster/
>
> As a minimum, we will need to modify the following files:
> 1.      core-site.xml
> <property>
>   <name>fs.defaultFS</name>
>     <value>hdfs://<host-name>/:8020/</value> # it is the host-name of your
> NameNode -Node1 which you found with “hostname –f” above
>   </property>
>
> 2.      mapred-site.xml
>   <property>
>     <name>mapred.job.tracker</name>
>     <!--<value><host-name>:8021</value> --> # it is host-name of your
> NameNode – Node 1  as well, since we intend to run NameNode and JobTracker
> on the same machine
>     <value>hdfs://ip-10-62-62-235.ec2.internal:8021</value>
>   </property>
>
> 3.      masters # if this file doesn’t exist yet, create it and add one
> line:
> <host-name> # it is the host-name of your Node2 – running SecondaryNameNode
>
> 4.      slaves # if this file doesn’t exist yet, create it and add your
> host-names ( one per line):
> <host-name> # it is the host-name of your Node3 – running DataNode1
> <host-name> # it is the host-name of your Node4 – running DataNode2
> ….
> <host-name> # it is the host-name of your NodeN – running DataNodeN
>
>
> 5.      If you are not comfortable touching hdfs-site.xml, no problem,
> after you format your NameNode, it will create dfs/name dfs/data etc.
> folder structure in your local Linux default /tmp/hadoop-hdfs/directory.
> You could later change this to a different folder by specifying
> hdfs-site.xml  but please learn on the file structure/permissions/owners of
> those directories /dfs/data dfs/name dfs/namesecondary etc that were
> created for you by default first.
>
> Let’s format HDFS namespace: (note we format it as hdfs user)
> $ sudo –u hdfs hadoop  namenode –format
> NOTE – that you only run this command ONCE on the NameNode only!
>
> I only added the following property to my hdfs-site.xml on the NameNode-
> Node1 for the SecondaryNameNode to use:
>
> <property>
>   <name>dfs.namenode.http-address</name>
>   <value>namenode.host.address:50070</value>   # I change this to
> 0.0.0.0:50070 for EC2 environment
>   <description>
>     Needed for running SNN
>     The address and the base port on which the dfs NameNode Web UI will
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB