-Re: md5 hash key and splits
Mohit Anchlia 2012-08-31, 14:55
On Thu, Aug 30, 2012 at 11:52 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> > In general isn't it better to split the regions so that the load can be
> > spread accross the cluster to avoid HotSpots?
> Time series data is a particular case  and the sematextians have
> tools to help w/ that particular loading pattern. Is time series your
> loading pattern? If so, yes, you need to employ some smarts (tsdb
> schema and write tricks or hbasewd tool) to avoid hotspotting. But
> hotspotting is an issue apart from splts; you can split all you want
> and if your row keys are time series, splitting won't undo them.
> My data is timeseries and to get random distribution and still have the
keys in the same region for a user I am thinking of using
md5(userid)+reversetimestamp as a row key. But with this type of key how
can one do pre-splits? I have 30 nodes.
> You would split to distribute load over the cluster and HBase should
> be doing this for you w/o need of human intervention (caveat the
> reasons you might want to manually split as listed above by AK and
> 1. http://hbase.apache.org/book.html#rowkey.design