Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> using date as key


Copy link to this message
-
using date as key
Hi,
Last week I consulted he forum about hbase insertion optimization when  the
key format is : date_key.
This key format is very good for efficient scans but creates hotspot a
single region when inserting millions of rows.

I would like to share and get a feedback on the solution we found:
1. insert one day. after region split see the start-end row of each server
(this is done one to see keys distribution).
2. now, before inserting a day create programmatically empty regions with
the start-end key from 1 (by creating rows in the meta-table).
Assuming row key-distribution of a day does not change dramatically, the
reduces can insert to multiple regions (thus avoiding hotspotting).

Applying this method improved insert performance by a factor of 5 or so.

Lior
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB