Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> several doubts about region split?

yonghu 2013-07-17, 13:53
Ted Yu 2013-07-17, 14:05
yonghu 2013-07-17, 14:10
Copy link to this message
Re: several doubts about region split?


On Wed, Jul 17, 2013 at 7:10 AM, yonghu <[EMAIL PROTECTED]> wrote:
> Thanks for your quick response!
> For the question one, what will be the latency? How long we need to wait
> until the daughter regions are again online?

Usually a matter of 1-2 seconds.

> regards!
> Yong
> On Wed, Jul 17, 2013 at 4:05 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> bq. Does it mean the region which will be splitted is not available
>> anymore?
>> Right.
>> bq. What happened to the read and write requests to that region?
>> The requests wouldn't be served by the hosting region server until daughter
>> regions become online.
>> Will try to dig up answer to question #2.
>> In short, load balancer is supposed to offload one of the daughter regions
>> if continuous write load incurs.
>> Cheers
>> On Wed, Jul 17, 2013 at 6:53 AM, yonghu <[EMAIL PROTECTED]> wrote:
>> > Dear all,
>> >
>> > From the HBase reference book, it mentions that when RegionServer splits
>> > regions, it will offline the split region and then adds the daughter
>> > regions to META, opens daughters on the parent's hosting RegionServer and
>> > then reports the split to the Master.
>> >
>> > I have a several questions:
>> >
>> > 1. What does offline means? Does it mean the region which will be
>> splitted
>> > is not available anymore? What happened to the read and write requests to
>> > that region?
>> >
>> > 2. From the description, if I understand right it means that now the
>> > RegionServer will contain two Regions (One RegionServer for both daughter
>> > and parent regions ) instead of one RegionSever for daughter and one for
>> > parent. If it is, what are the benefits of this approach? Hot-spot
>> problem
>> > is still there.

It's not a load problem it's a data problem. We're splitting when we
have enough data. Then HBase relies on the master doing some balancing
on the cluster.

>>Moreover, this approach will be a big problem if we use the
>> > HBase default split approach. Suppose we bulk load data into HBase
>> cluster,
>> > initially every write request will be accepted by only one RegionServer.
>> > After some write requests, the RegionServer cannot response any write
>> > request as it reaches its disk volume threshold. Hence, some data must be
>> > removed from one RegionSever to the other RegionServer. The question is
>> > that why we don't do it at the region split time?

Since you read the reference book, you will also find in there that we
recommend never bulk loading data into a table with only 1 region. You
should always create your tables with pre-defined splits if you plan
on importing a lot of data.