Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Merging Namenode Federation feature (HDFS-1052) to trunk


+
Suresh Srinivas 2011-03-03, 22:41
+
Allen Wittenauer 2011-03-12, 16:43
+
Konstantin Shvachko 2011-03-14, 17:28
+
Dhruba Borthakur 2011-03-14, 17:43
+
Konstantin Shvachko 2011-03-15, 01:12
+
Travis Crawford 2011-03-15, 05:36
+
suresh srinivas 2011-03-15, 06:19
+
Konstantin Shvachko 2011-03-16, 22:52
+
suresh srinivas 2011-03-16, 23:54
+
suresh srinivas 2011-03-16, 23:55
+
Sanjay Radia 2011-03-14, 17:57
+
Sanjay Radia 2011-03-21, 23:08
Copy link to this message
-
Re: Merging Namenode Federation feature (HDFS-1052) to trunk

On Mar 21, 2011, at 6:08 PM, Sanjay Radia wrote:

>
> On Mar 14, 2011, at 10:57 AM, Sanjay Radia wrote:
>
>>
>> On Mar 12, 2011, at 8:43 AM, Allen Wittenauer wrote:
>>
>>>
>>> To me, this series of changes looks like it is going to make
>>> running a grid much much harder for very little benefit.  In
>>> particular, I don't see the difference between running multiple NN/
>>> DN combinations verses running federation, especially with client
>>> side mount tables in play.
>>
>>
>>
>> Main difference between independent HDFS clusters and HDFS federation
>> is that in federation one can shares the storage of the DNs and the DNs.
>> There is a very detailed document that describes this on the Jira.
>>
>> If you are running a single NN and you don't need the scaling then
>> running and managing hadoop is for all practical purposes unchanged.
>>
>>
>> sanjay
>>>
>>
>
>
> Allen, not sure if I explained the difference above.
> Base on the discussion we had at the Hug, I want to clarify a few things
>
> In federation the NNs and the DNs are part of  a cluster. It is not as if a data node is willing to store blocks for any NN anywhere in the data center.
> We still expect a data center to have multiple hadoop clusters each with a set of data nodes and each cluster with 1 or more NNs.
> A DN stores block for only ONE cluster.

A few questions:
- Do we have a clear definition for a cluster?
- With the above definition, is it an error if not all DNs belong to the same set of NNs?
- With the working definition of a cluster, what namespace guarantees are given to clients?

The reason I ask is not because I oppose the idea of federations, but rather am curious of about the terminology and how it's 'advertised' to the user.  I rather like the design; it has similar ideas to a NSF project I've seen (http://www.reddnet.org/).

>
> You had asked about how one debugs a corrupt file or corrupt block.
> In the old world a file's inode contains the block ids of its blocks. There is also a mapping from block id to block location (ie which DN).
> In the federated hdfs, each block is identified by a longer block id, called the extended block id= blockPool Id + block id.
> A block pool is owned by only ONE NN.
> Hence if you are trying to locate a block then you map the extended block id to the block location (ie DN) - this is the same as before, except that the identifier
> of the block is merely longer.
>
> If you are trying to debug from the point of view of the DN:
> In federated HDFS, the blocks stored in the DN are segregated in directories by the blockPool Id.
> The block pool id can be mapped to a NN since each Block pool has only  ONE owner.
> Hence to map from a block to a particular NN is easy - the first part of the Block's longer identifier  will tell you which NN owns that block.
>

This sounds good.

Brian

+
suresh srinivas 2011-03-24, 09:28
+
Allen Wittenauer 2011-03-22, 20:11
+
suresh srinivas 2011-03-24, 09:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB