-Re: Regarding design of HDFS
Ted Dunning 2011-09-13, 11:20
2011/9/13 kang hua <[EMAIL PROTECTED]>
> Hi Master:
> can you explain more detail --- "The only way to avoid this is to
> make the data much more cacheable and to have a viable cache coherency
> strategy. Cache coherency at the meta-data level is difficult. Cache
> coherency at the block level is also difficult (but not as difficult)
> because many blocks get moved for balance purposes"
> why "Cache coherency at the meta-data level is difficult" ?
I said this because meta-data is updated often. Caching in the presence of
high updates requires some sort of coherency model. For meta-data, it is
difficult to detect stale information on use and use of stale information
can be disastrous. Thus, caching is difficult.
> why "Cache coherency at the block level is also difficult (but not as
> difficult) because many blocks get moved for balance purposes"
The basic problem here is update rate. Late detection of stale information
is much easier however since you can just note that the block isn't where
you thought it was and update your cache. There are still problems and the
fact that race conditions are still being found in the HDFS lease management
code is an indicator that this isn't a completely trivial problem.