Suresh Srinivas 2011-03-03, 22:41
Allen Wittenauer 2011-03-12, 16:43
Konstantin Shvachko 2011-03-14, 17:28
Dhruba Borthakur 2011-03-14, 17:43
Konstantin Shvachko 2011-03-15, 01:12
Travis Crawford 2011-03-15, 05:36
suresh srinivas 2011-03-15, 06:19
Konstantin Shvachko 2011-03-16, 22:52
suresh srinivas 2011-03-16, 23:54
suresh srinivas 2011-03-16, 23:55
Sanjay Radia 2011-03-14, 17:57
On Mar 14, 2011, at 10:57 AM, Sanjay Radia wrote:
> On Mar 12, 2011, at 8:43 AM, Allen Wittenauer wrote:
>> To me, this series of changes looks like it is going to make
>> running a grid much much harder for very little benefit. In
>> particular, I don't see the difference between running multiple NN/
>> DN combinations verses running federation, especially with client
>> side mount tables in play.
> Main difference between independent HDFS clusters and HDFS federation
> is that in federation one can shares the storage of the DNs and the
> There is a very detailed document that describes this on the Jira.
> If you are running a single NN and you don't need the scaling then
> running and managing hadoop is for all practical purposes unchanged.
Allen, not sure if I explained the difference above.
Base on the discussion we had at the Hug, I want to clarify a few things
In federation the NNs and the DNs are part of a cluster. It is not as
if a data node is willing to store blocks for any NN anywhere in the
We still expect a data center to have multiple hadoop clusters each
with a set of data nodes and each cluster with 1 or more NNs.
A DN stores block for only ONE cluster.
You had asked about how one debugs a corrupt file or corrupt block.
In the old world a file's inode contains the block ids of its blocks.
There is also a mapping from block id to block location (ie which DN).
In the federated hdfs, each block is identified by a longer block id,
called the extended block id= blockPool Id + block id.
A block pool is owned by only ONE NN.
Hence if you are trying to locate a block then you map the extended
block id to the block location (ie DN) - this is the same as before,
except that the identifier
of the block is merely longer.
If you are trying to debug from the point of view of the DN:
In federated HDFS, the blocks stored in the DN are segregated in
directories by the blockPool Id.
The block pool id can be mapped to a NN since each Block pool has
only ONE owner.
Hence to map from a block to a particular NN is easy - the first part
of the Block's longer identifier will tell you which NN owns that
Brian Bockelman 2011-03-21, 23:25
suresh srinivas 2011-03-24, 09:28
Allen Wittenauer 2011-03-22, 20:11
suresh srinivas 2011-03-24, 09:34