Don't know about question 4 but for the first three -- the metadata is
in the memory of the namenode at runtime but is also persisted to disk
(otherwise it would be lost if you shut down and re-start the namenode).
The copy persisted to disk is on the native file system (not HDFS) and
no is not automatically replicated. You have to protect your cluster by
backing it up. You not get two issues:
1. The number of files you can store is limited by the amount of memory
on the namenode
2. Since all access to files starts with getting the metadata the
network I/O lf the namenode is a possible limit
Those two issues are solved by splitting the duty across 2 namenodes
hence the advantage of federation.
[Hopefully someone who knows will tell you about question 4]
On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
> While trying to understand federated HDFS in detail I had few doubts
> and listing them down for your help.
> 1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
> data about the blocks belonging to the files in HDFS is maintained
> in the main memory of the name node or it is stored on permanent
> storage of the namenode and is brought in the main memory on
> demand basis ?[Krishna] Based on my understanding, I assume the
> entire metadata is in main memory which is an issue by itself.
> Please correct me if my understanding is wrong.
> 2. In case of*_federated HDFS_*, the metadata or the data about the
> blocks belonging to files in a particular namespace is maintained
> in the main memory of the namenode or it is stored on the
> permanent storage of the namenode and is brought in the main
> memory on demand basis ?
> 3. Are the metadata information stored in separate cluster
> nodes(block management layer separation) as discussed in Appendix
> B of this document
> 4. I would like to know if the following proposals are already
> implemented in federated HDFS.
> (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
> * Separation of namespace and block management layers (same as qn.3)
> * Partial namespace in memory for further scalability
> * Move partial namespace from one namenode to another