Unfortunately I agree it's a bit complex, especially because "Block" is
sometimes used where "Replica" might be more accurate. If you find any
ambiguities like this, I think we'd happily take patches with clarifying
comments / javadoc.
The best way to learn is to read the code, but maybe this will help a bit:
- The NameNode uses the BlocksMap to store the block -> datanode locations
mapping. This is done by the BlockInfo class, which actually holds the
locations of the block's replicas in the triplets array. The map is
appropriately managed by the BlockManager.
- BlockInfo is also a GSet.Element, which is used to get the set of
BlockInfo on a particular datanode. This is primarily useful when
processing block reports.
- LocatedBlock and LocatedBlocks are used in
ClientProtocol#getBlockLocations, which clients use to query the block ->
datanode mapping. It makes sense to have separate client and server Block
representations here, though they aren't the purest.
- INodes are pretty separate from Blocks. BlockInfo has a pointer back to
the containing BlockCollection, which can be some type of INode, but that's
about all the BlockManager worries about.
On Tue, Oct 15, 2013 at 11:18 PM, Yoonmin Nam <[EMAIL PROTECTED]> wrote:
> When we see the source code of hdfs especially FSNamesystem, there is so
> many block related types are used such as Block, LocatedBLocks,
> BlocksWithLocations. And this makes me very unclear about the system.
> In addition, BlocksMap just maps Block and BlockInfo, but Block becomes
> LocatedBlock with DatanodeInfo. With several locateBlock, these become
> Also, Combining INode related classes with Block related classes makes me
> Is there anyone who let me know about the motto of this kind of complex
> structure of HDFS block management and give more specific and detail