-Consider individual RSs performance when writing records with random keys?
Not sure if you've seen HBaseWD (https://github.com/sematext/HBaseWD)
project. It implements the "salt keys with prefix" approach when writing
monotonically increasing row key/timeseries data. If simplified, the idea
is to add random prefix to the row key so that writes end up on different
region servers (avoiding single RS hotspot).
When writing data to HBase with salted or random keys (so that load is well
distributed over cluster) the write speed per RS is limited by the slowest
RS in cluster (singe one Region is served by one RS).
Given 1 & 2 I got this crazy idea to:
* write in multiple threads
* each prefix (or interval of keys in case of completely random keys) is
assigned to particular thread, so that records with this prefix always
written by that thread
* measure how well each thread performs (e.g. write speed)
* based on each thread performance, salt (or randomize) keys in a biased
way, so that threads which perform better got more records to write
Thus we will be loading less those RSs that are "slower" and overall load
will be more or less balanced which will give max write performance for the
This might work if each thread is writing into relatively small number of
all RSs though only, I think. Otherwise they will perform more or less the
Am I completely crazy when thinking about this? Does it makes sense to you
Sematext :: http://blog.sematext.com/
Alex Baranau 2012-05-23, 21:41