Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store


+
Joarder KAMAL 2013-06-26, 23:24
+
Stack 2013-06-27, 16:26
+
Joarder KAMAL 2013-06-28, 00:02
+
Nick Dimiduk 2013-07-02, 17:37
Copy link to this message
-
Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store
Joarder KAMAL 2013-07-03, 05:29
Dear Nick,

Thanks a lot for the nice explanation. I just left yet with a small
confusion:

As you told: Your logical partitioning remains the same whether it's being
served by N, 2N, or 3.5N+2 RegionServers.

When a region is splitted into two, I guess the logical partitioning is
changed, right? Could you kindly clarify a bit more.
And my question is without making any change to my initial
row-key-generation function, how new writes will go to the new regions?
I assume it is hard to predict the number of RS initially as well as
creating pre-splitted regions in very large-scale production systems. I am
not worried about the default load-balancing behaviour of HBase. St.Ack and
you've also clearly explained that as well.

For an example: as indicated in the Lars George's book where he used <# of
RS> while finding the prefix, I guess <# of Regions> could be also used (if
regions are pre-splitted)
----------------------------------------------------------------------------------------------------------------------------
*Salting*
You can use a salting prefix to the key that guarantees a spread of all
rows across all region servers. For example:

byte prefix = (byte) (Long.hashCode(timestamp) % <number of region
servers>);
byte[] rowkey = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(timestamp);

This formula will generate enough prefix numbers to ensure that rows are
sent to all region servers. Of course, the formula assumes a
*specific*number of servers, and if
you are planning to *grow your cluster* you should set this number to a *
multiple* instead.
----------------------------------------------------------------------------------------------------------------------------

Thanks again ...


Regards,
Joarder Kamal
On 3 July 2013 03:37, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> Hi Joarder,
>
> I think you're slightly confused about the impact of using a hashed (or
> sometimes called "salted") prefix for your rowkeys. This strategy for
> rowkey design has an impact on the logical ordering of your data, not
> necessarily the physical distribution of your data. In HBase, these are
> orthogonal concerns. It means that to execute a bucket-agnostic query, the
> client must initiate N scans. However, there's no guarantee that all
> regions starting with the same hash land on the same RegionServer. Region
> assignment is a complex beast; as I understand, it's based on a randomish,
> load-based assignment.
>
> Take a look at your existing table distributed on a size-N cluster. Do all
> regions that fall within the first bucket sit on the same RegionServer?
> Likely not. However, look at the number of regions assigned to each
> RegionServer. This should be close to even. Adding a new RegionServer to
> the cluster will result in some of those regions migrating from the other
> servers to the new one. The impact will be a decrease in the average number
> of regions served per RegionServer.
>
> Your logical partitioning remains the same whether it's being served by N,
> 2N, or 3.5N+2 RegionServers. Your client always needs to execute that
> bucket-agnostic query as N scans, touching each of the N buckets. Precisely
> which RegionServers are touched by any given scan depends entirely on how
> the balancer has distributed load on your cluster.
>
> Thanks,
> Nick
>
> On Thu, Jun 27, 2013 at 5:02 PM, Joarder KAMAL <[EMAIL PROTECTED]> wrote:
>
> > Thanks St.Ack for mentioning about the load-balancer.
> >
> > But my question was two folded:
> > Case-1. If a new RS is added, then the load-balancer will do it's job
> > considering no new region has been created in the meanwhile. // As you've
> > already answered.
> >
> > Case-2. Whether a new RS is added or not, an existing region is splitted
> > into two, then how the new writes will to the new region? Because, lets
> say
> > initially the hash function was calculated with *N* Regions and now there
> > are *N+1* Regions in the cluster.
> >
> > ​In that case, do I need to change the Hash function and reshuffle all
+
Joarder KAMAL 2013-06-27, 13:44