What documentation are you reading? This is pretty accurate :
The NameNode is indeed responsible for the metadata. And all the datanodes
report to the NameNode (so they are all slaves). You are right, the data
blocks are stored on the DataNodes. Perhaps I am lacking knowledge of the
history, but as of now there's no "master server". All read write requests
on files are directed at the Namenode from where they get redirected to the
appropriate DataNode holding the block.
So your configuration for replication factor 3 would look like:
fs.default.name = hdfs://machineAAA:54321/
....possibly a lot more
Hope this helps
On Thu, May 24, 2012 at 6:30 AM, Chao Huang <[EMAIL PROTECTED]> wrote:
> Hello experts,
> I'm new to hdfs/hadoop. After reading the hdfs documents, I'm getting
> confused by the differences between a namenode and a master server. It's
> my understanding that the namenode is responsible for managing metadata,
> while the master-replica group (which is comprised by a number of
> datanodes) stores the actual data blocks. In the master-replica group, the
> master server accepts read/write requests, and load balances (or routes)
> read requests to the appropriate replica. In other words, we should
> configure the namenode and master server on two different physical machines
> in a production environment, right? Is this a correct assumption?
> One other question about HDFS cluster setup:
> - requirements: one namenode, replication factor = 3, in a production
> how would the topology look like? Can I configure as follows?
> in conf/core-site.xml:
> fs.default.name = hdfs://machineAAA:54321/
> in conf/masters:
> in conf/slaves:
> Can someone please confirm and/or comment?
> Sorry for my new bie questions. Thanks for the help.