Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - md5 hash key and splits


Copy link to this message
-
Re: md5 hash key and splits
Mohit Anchlia 2012-08-31, 14:55
On Thu, Aug 30, 2012 at 11:52 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> > In general isn't it better to split the regions so that the load can be
> > spread accross the cluster to avoid HotSpots?
> >
>
> Time series data is a particular case [1] and the sematextians have
> tools to help w/ that particular loading pattern.  Is time series your
> loading pattern?  If so, yes, you need to employ some smarts (tsdb
> schema and write tricks or hbasewd tool) to avoid hotspotting.  But
> hotspotting is an issue apart from splts; you can split all you want
> and if your row keys are time series, splitting won't undo them.
>
> My data is timeseries and to get random distribution and still have the
keys in the same region for a user I am thinking of using
md5(userid)+reversetimestamp as a row key. But with this type of key how
can one do pre-splits? I have 30 nodes.
> You would split to distribute load over the cluster and HBase should
> be doing this for you w/o need of human intervention (caveat the
> reasons you might want to manually split as listed above by AK and
> Ian).
>
> St.Ack
> 1. http://hbase.apache.org/book.html#rowkey.design
>