Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> md5 hash key and splits


Copy link to this message
-
Re: md5 hash key and splits
In general isn't it better to split the regions so that the load can be
spread accross the cluster to avoid HotSpots?

I read about pre-splitting here:

http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

On Thu, Aug 30, 2012 at 4:30 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:

> Also, you might have read that an initial loading of data can be better
> distributed across the cluster if the table is pre-split rather than
> starting with a single region and splitting (possibly aggressively,
> depending on the throughput) as the data loads in. Once you are in a stable
> state with regions distributed across the cluster, there is really no
> benefit in terms of spreading load by managing splitting manually v/s
> letting HBase do it for you. At that point it's about what Ian mentioned -
> predictability of latencies by avoiding splits happening at a busy time.
>
> On Thu, Aug 30, 2012 at 4:26 PM, Ian Varley <[EMAIL PROTECTED]>
> wrote:
>
> > The Facebook devs have mentioned in public talks that they pre-split
> their
> > tables and don't use automated region splitting. But as far as I
> remember,
> > the reason for that isn't predictability of spreading load, so much as
> > predictability of uptime & latency (they don't want an automated split to
> > happen at a random busy time). Maybe that's what you mean, Mohit?
> >
> > Ian
> >
> > On Aug 30, 2012, at 5:45 PM, Stack wrote:
> >
> > On Thu, Aug 30, 2012 at 7:35 AM, Mohit Anchlia <[EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]>> wrote:
> > From what I;ve read it's advisable to do manual splits since you are able
> > to spread the load in more predictable way. If I am missing something
> > please let me know.
> >
> >
> > Where did you read that?
> > St.Ack
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB