-availability and data replica issues of HBase
yonghu 2011-12-09, 19:40
I read some discussions from the mail-list. It mentions the read and
write operations for the same data object will be routed into the same
RegionServer. This strategy can guarantee data consistency. But, how
about availability? If this RegionServer is down or temporarily not
available, the master will assign a new RegionServer for processing
data request or just wait until that RegionServer comes back? If mater
assigns new RegionServer, how can new RegionServer obtains data?
The other issue is about work-balance. If a huge amount of read and
write operations only apply on a small set of data, one RegionServer
may become a hot-spot. How HBase deal with this problems?
The last question is about data replica. The HBase data is still
stored in HDFS. HDFS will use eager synchronization (pipelining) to
synchronize all replicas. If HBase write data into HDFS, when should
HDFS return the write finishing acknowledge to HBase, just waiting
until one replica update or until all replicas update?