Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> md5 hash key and splits


Copy link to this message
-
Re: md5 hash key and splits
On Thu, Aug 30, 2012 at 11:52 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> > In general isn't it better to split the regions so that the load can be
> > spread accross the cluster to avoid HotSpots?
> >
>
> Time series data is a particular case [1] and the sematextians have
> tools to help w/ that particular loading pattern.  Is time series your
> loading pattern?  If so, yes, you need to employ some smarts (tsdb
> schema and write tricks or hbasewd tool) to avoid hotspotting.  But
> hotspotting is an issue apart from splts; you can split all you want
> and if your row keys are time series, splitting won't undo them.
>
> My data is timeseries and to get random distribution and still have the
keys in the same region for a user I am thinking of using
md5(userid)+reversetimestamp as a row key. But with this type of key how
can one do pre-splits? I have 30 nodes.
> You would split to distribute load over the cluster and HBase should
> be doing this for you w/o need of human intervention (caveat the
> reasons you might want to manually split as listed above by AK and
> Ian).
>
> St.Ack
> 1. http://hbase.apache.org/book.html#rowkey.design
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB