Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> using date as key

Copy link to this message
using date as key
Last week I consulted he forum about hbase insertion optimization when  the
key format is : date_key.
This key format is very good for efficient scans but creates hotspot a
single region when inserting millions of rows.

I would like to share and get a feedback on the solution we found:
1. insert one day. after region split see the start-end row of each server
(this is done one to see keys distribution).
2. now, before inserting a day create programmatically empty regions with
the start-end key from 1 (by creating rows in the meta-table).
Assuming row key-distribution of a day does not change dramatically, the
reduces can insert to multiple regions (thus avoiding hotspotting).

Applying this method improved insert performance by a factor of 5 or so.