Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Consider individual RSs performance when writing records with random keys?


Copy link to this message
-
Consider individual RSs performance when writing records with random keys?
Hi,

1.
Not sure if you've seen HBaseWD (https://github.com/sematext/HBaseWD)
project. It implements the "salt keys with prefix" approach when writing
monotonically increasing row key/timeseries data. If simplified, the idea
is to add random prefix to the row key so that writes end up on different
region servers (avoiding single RS hotspot).

2.
When writing data to HBase with salted or random keys (so that load is well
distributed over cluster) the write speed per RS is limited by the slowest
RS in cluster (singe one Region is served by one RS).

Given 1 & 2 I got this crazy idea to:
* write in multiple threads
* each prefix (or interval of keys in case of completely random keys) is
assigned to particular thread, so that records with this prefix always
written by that thread
* measure how well each thread performs (e.g. write speed)
* based on each thread performance, salt (or randomize) keys in a biased
way, so that threads which perform better got more records to write

Thus we will be loading less those RSs that are "slower" and overall load
will be more or less balanced which will give max write performance for the
cluster.
This might work if each thread is writing into relatively small number of
all RSs though only, I think. Otherwise they will perform more or less the
same.

Am I completely crazy when thinking about this? Does it makes sense to you
at all?

Alex Baranau
------
Sematext :: http://blog.sematext.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB