-Re: Regarding design of HDFS
Aaron Eng 2011-09-13, 04:16
>> The only way to avoid this is to make the data much more cacheable and to
>> have a viable cache coherency strategy. Cache coherency at the meta-data
>> level is difficult. Cache coherency at the block level is also difficult
>> (but not as difficult) because many blocks get moved for balance purposes.
> I would argue that a federation-model is much more scalable, elegant and
> easier to maintain. It takes a very well-oiled building block like the
> NameNode and allows you to use multitudes of them in a single This is
> already part of HDFS trunk code base.
I think Sesha's questions are about the memory footprint of the namenode and
why it has to operate the way it does, why isn't it possible for the
existing namenode capabilities to be implemented in a design where less
memory is used? I think the motivation for that question is based on the
assessment that the resource utilization profile of the NameNode is
inefficient, that it occupies large amounts of memory but in many use cases,
much of that memory is accessed infrequently. I do not think his question
was about how to further scale out an inefficient system. I would argue
that federation is a way of taking an inefficient system and scaling it out
such that the system is larger but with the same proportion of
inefficiency. I don't think it addresses the problem Sesha is asking about.
The short answer, as you've probably gathered, is that it is difficult to
adapt the existing HDFS code base to support the type of model you are