Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Merging Namenode Federation feature (HDFS-1052) to trunk


+
Suresh Srinivas 2011-03-03, 22:41
+
Allen Wittenauer 2011-03-12, 16:43
+
Konstantin Shvachko 2011-03-14, 17:28
+
Dhruba Borthakur 2011-03-14, 17:43
+
Konstantin Shvachko 2011-03-15, 01:12
+
Travis Crawford 2011-03-15, 05:36
+
suresh srinivas 2011-03-15, 06:19
+
Konstantin Shvachko 2011-03-16, 22:52
+
suresh srinivas 2011-03-16, 23:54
+
suresh srinivas 2011-03-16, 23:55
+
Sanjay Radia 2011-03-14, 17:57
Copy link to this message
-
Re: Merging Namenode Federation feature (HDFS-1052) to trunk

On Mar 14, 2011, at 10:57 AM, Sanjay Radia wrote:

>
> On Mar 12, 2011, at 8:43 AM, Allen Wittenauer wrote:
>
>>
>> To me, this series of changes looks like it is going to make
>> running a grid much much harder for very little benefit.  In
>> particular, I don't see the difference between running multiple NN/
>> DN combinations verses running federation, especially with client
>> side mount tables in play.
>
>
>
> Main difference between independent HDFS clusters and HDFS federation
> is that in federation one can shares the storage of the DNs and the  
> DNs.
> There is a very detailed document that describes this on the Jira.
>
> If you are running a single NN and you don't need the scaling then
> running and managing hadoop is for all practical purposes unchanged.
>
>
> sanjay
>>
>
Allen, not sure if I explained the difference above.
Base on the discussion we had at the Hug, I want to clarify a few things

In federation the NNs and the DNs are part of  a cluster. It is not as  
if a data node is willing to store blocks for any NN anywhere in the  
data center.
We still expect a data center to have multiple hadoop clusters each  
with a set of data nodes and each cluster with 1 or more NNs.
A DN stores block for only ONE cluster.

You had asked about how one debugs a corrupt file or corrupt block.
In the old world a file's inode contains the block ids of its blocks.  
There is also a mapping from block id to block location (ie which DN).
In the federated hdfs, each block is identified by a longer block id,  
called the extended block id= blockPool Id + block id.
A block pool is owned by only ONE NN.
Hence if you are trying to locate a block then you map the extended  
block id to the block location (ie DN) - this is the same as before,  
except that the identifier
of the block is merely longer.

If you are trying to debug from the point of view of the DN:
  In federated HDFS, the blocks stored in the DN are segregated in  
directories by the blockPool Id.
The block pool id can be mapped to a NN since each Block pool has  
only  ONE owner.
Hence to map from a block to a particular NN is easy - the first part  
of the Block's longer identifier  will tell you which NN owns that  
block.
sanjay
+
Brian Bockelman 2011-03-21, 23:25
+
suresh srinivas 2011-03-24, 09:28
+
Allen Wittenauer 2011-03-22, 20:11
+
suresh srinivas 2011-03-24, 09:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB