-Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store
I think you're slightly confused about the impact of using a hashed (or
sometimes called "salted") prefix for your rowkeys. This strategy for
rowkey design has an impact on the logical ordering of your data, not
necessarily the physical distribution of your data. In HBase, these are
orthogonal concerns. It means that to execute a bucket-agnostic query, the
client must initiate N scans. However, there's no guarantee that all
regions starting with the same hash land on the same RegionServer. Region
assignment is a complex beast; as I understand, it's based on a randomish,
Take a look at your existing table distributed on a size-N cluster. Do all
regions that fall within the first bucket sit on the same RegionServer?
Likely not. However, look at the number of regions assigned to each
RegionServer. This should be close to even. Adding a new RegionServer to
the cluster will result in some of those regions migrating from the other
servers to the new one. The impact will be a decrease in the average number
of regions served per RegionServer.
Your logical partitioning remains the same whether it's being served by N,
2N, or 3.5N+2 RegionServers. Your client always needs to execute that
bucket-agnostic query as N scans, touching each of the N buckets. Precisely
which RegionServers are touched by any given scan depends entirely on how
the balancer has distributed load on your cluster.
On Thu, Jun 27, 2013 at 5:02 PM, Joarder KAMAL <[EMAIL PROTECTED]> wrote:
> Thanks St.Ack for mentioning about the load-balancer.
> But my question was two folded:
> Case-1. If a new RS is added, then the load-balancer will do it's job
> considering no new region has been created in the meanwhile. // As you've
> already answered.
> Case-2. Whether a new RS is added or not, an existing region is splitted
> into two, then how the new writes will to the new region? Because, lets say
> initially the hash function was calculated with *N* Regions and now there
> are *N+1* Regions in the cluster.
> In that case, do I need to change the Hash function and reshuffle all the
> existing data within the cluster !! Or, HBase has some mechanism to handle
> Many thanks again for helping me out...
> Joarder Kamal
> On 28 June 2013 02:26, Stack <[EMAIL PROTECTED]> wrote:
> > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <[EMAIL PROTECTED]>
> > > May be a simple question to answer for the experienced HBase users and
> > > developers:
> > >
> > > If I use hash partitioning to evenly distribute write workloads into my
> > > region servers and later add a new region server to scale or split an
> > > existing region, then do I need to change my hash function and
> > > all the existing data in between all the region servers (old and new)?
> > Or,
> > > is there any better solution for this? Any guidance would be very much
> > > helpful.
> > >
> > You do not need to change your hash function.
> > When you add a new regionserver, the balancer will move some of the
> > existing regions to the new host.
> > St.Ack