Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Finding the correct region server


Copy link to this message
-
Re: Finding the correct region server
If I understand you right, you are asking about how region splitting works ...
See http://hbase.apache.org/book/regions.arch.html section 9.7.4

In a nutshell, the parent region on your RS1 will split into two
daughter regions on the same RS1. If you have load balancer turned on,
the master can then "reassign" the daughter regions to other
RegionServers based on the number of regions being served by each RS.
This is unrelated to how many requests RSn may be receiving. The
"region load" above is just number of regions per RS currently.

The scheme you describe below would only work in a very "static" data
/ region assignment scenario where a region will always stick to the
same RS until you manually move it around (load balancer turned off,
region size tuned up).
This is a highly recommended read:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

If you are worried about latency, I hope you have also read up on
"Block Cache" and MemStore and sizing them appropriately for your
workload.
--Suraj

On Fri, Jun 29, 2012 at 10:15 AM, Ramchander Varadarajan
<[EMAIL PROTECTED]> wrote:
> Hi all,
>
> We are evaluating Hbase to store some metadata information on a very large scale. As of now, our architecture looks like this.
>
> Machine 1:
>     Runs Client 1
>     Runs Region Server 1
>     Runs Data Node 1
>
> Machine n:
>     Runs Client n
>     Runs Region Server n
>     Runs Data Node n
>
> Now, say, we have only one Region for the data set at the moment and its maxing out, and the region is in Region Server 1. If a flood of new requests come in to Machine n, and it tries to store the data, will Region Server n store it locally on its data node n, or will the requests be routed to Region Server 1 and a new region is created there after it splits?
>
> The reason I ask is because I want to see if a Client can be made sticky to a region server. That way, if a user with an id 1111 comes in, he will be sent to Client 1 all the time, because we know Region Server 1 will have his region. We will know that by using his id to figure that out upfront. Just trying to minimize the latency further. ( Of course I understand that if nodes are down, there will be ways to route the traffic to another host to handle the users that fall in that bucket)
>
> thanks in advance