Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: Fixing badly distributed table manually.


+
Ivan Balashov 2012-12-24, 16:27
+
Mohit Anchlia 2012-12-24, 17:53
+
anil gupta 2012-12-24, 20:23
Copy link to this message
-
Re: Fixing badly distributed table manually.
Vincent Barat 2013-04-10, 16:31
Hi,

Sorry for not responding: I'm not on the list very often.

It seems to be of interest for some of you, so we will publish this
script on GitHub, so that everybody can test and improve it.
More info latter...

Regards,

Le 24/12/12 21:23, anil gupta a �crit :
> Hi Vincent,
>
> I dont know python but i am interested in learning about your solution. It
> would be great If you could also share the logic for balancing the cluster.
>
> Thanks,
> Anil Gupta
>
> On Mon, Dec 24, 2012 at 9:53 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
>
>> On Mon, Dec 24, 2012 at 8:27 AM, Ivan Balashov <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Vincent Barat <vbarat@...> writes:
>>>
>>>> Hi,
>>>>
>>>> Balancing regions between RS is correctly handled by HBase : I mean
>>>> that your RSs always manage the same number of regions (the balancer
>>>> takes care of it).
>>>>
>>>> Unfortunately, balancing all the regions of one particular table
>>>> between the RS of your cluster is not always easy, since HBase (as
>>>> for 0.90.3) when it comes to splitting a region, create the new one
>>>> always on the same RS. This means that if you start with a 1 region
>>>> only table, and then you insert lots of data into it, new regions
>>>> will always be created to the same RS (if you insert is a M/R job,
>>>> you saturate this RS). Eventually, the balancer at a time will
>>>> decide to balance one of these regions to other RS, limiting the
>>>> issue, but it is not controllable.
>>>>
>>>> Here at Capptain, we solved this problem by developing a special
>>>> Python script, based on the HBase shell, allowing to entirely
>>>> balance all the regions of all tables to all RS. It ensure that
>>>> regions of tables are uniformly deployed on all RS of the cluster,
>>>> with a minimum region transitions.
>>>>
>> Is it possible to describe the logic at high level on what you did?
>>
>>>> It is fast, and even if it can trigger a lot of region transitions,
>>>> there is very few impact at runtime and it can be run safely.
>>>>
>>>> If you are interested, just let me know, I can share it.
>>>>
>>>> Regards,
>>>>
>>> Vincent,
>>>
>>> I would much like to see and possibly use the script that you
>>> mentioned. We've just run  into the same issue (after the table
>>> has been truncated it was re-created with only 1 region, and
>>> after data loading and manual splits we ended up having all
>>> regions within the same RS).
>>>
>>> If you could share the script, it will be really appreciated,
>>> I believe not only by me.
>>>
>>> Thanks,
>>> Ivan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>