-Re: HBase split policy
Jean-Marc Spaggiari 2013-01-22, 14:10
I SPLIT_POLICY is define the same way MAX_FILESIZE is.... So I think
it's a table attribut and can be altered... That's a good news! I will
probably try it.
Also, the admin.split(rowkey) is the way I will use until I'm able to
properly use/set the SPLIT_POLICY. I will simply (try to) count the
rows in a region, and split in the middle...
Thanks for the hint regarding the SPLIT_POLICY.
2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>:
>>>Also, last thing. If I want to change the default behaviour and split
>>>based on the row number instead of the midkey, can I hook somewhere?
> HTableDescriptor myHtd = new HTableDescriptor();
> So the region split policy can be changed only during table creation i
> suppose. (May be wrong, not sure anyother way out there).
> When i meant split based on row key my point was like use
> admin.split(rowkey). I will check more on your calculations and figures
> and get back to you.
> On Tue, Jan 22, 2013 at 7:17 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>> Hi Anoop, Hi Ram,
>> Thanks for your replies.
>> I looked at the code and found in the HFileBlockIndex the midkey
>> function which is doing the computation used in the
>> Store.getSplitPoint() method.
>> Now, if all the keys are almost equals in size, and the table has only
>> one big 10GB region, if we lower the maxfilesize parameter to
>> something like 300MB, we should see only almost equals regions, right?
>> It's not the result I got. So I'm trying to figure where I'm wrong.
>> Also, last thing. If I want to change the default behaviour and split
>> based on the row number instead of the midkey, can I hook somewhere?
>> Or will I have to disable the default split (by setting the
>> maxfilesize to something like 20GB) and run a job to split the regions
>> 2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>:
>> > Hi Jean
>> > Before replying as to what i know, region splits can be configured too.
>> > Ok, now on how the split happens
>> > -> You can explicity ask the region to get splitted on a specific row
>> > If you know that splitting on that rowkey will yield you almost equal
>> > region sizes.
>> > -> Now when HBase tries to split, it just takes the midkey from the
>> > Here the midkey is the one that is the first key in the mid block of
>> > the
>> > HFile.
>> > Also the individual rows cannot be split. So if one row is nearly the
>> > of the region and other rows are smaller in size, it tries to find the
>> > block inside the HFile and the size of one the block is going to be
>> > very
>> > huge and that may be splitted as one region. I know this has to do
>> > with
>> > the internals of the splitting code.
>> > Regards
>> > Ram
>> > On Tue, Jan 22, 2013 at 5:12 PM, Jean-Marc Spaggiari <
>> > [EMAIL PROTECTED]> wrote:
>> >> Hi,
>> >> I'm wondering, what is HBase split policy.
>> >> I mean, let's imagine this situation.
>> >> I have a region full of rows starting from AA to AZ. Thousands of
>> >> hundreds. I also have few rows from B to DZ. Let's say only one
>> >> hundred.
>> >> Region is just above the maxfilesize, so it's fine.
>> >> No, I add "A" and store a very big row into it. Almost half the size
>> >> of my maxfilesize value. That mean it's now time to split this row.
>> >> How will HBase decide where to split it? Is it going to use the
>> >> lexical order? Which mean it will split somewhere between B and C? If
>> >> it's done that way, I will have one VERY small region, and one VERY
>> >> big which will still be over the maxfilesize and will need to be split
>> >> again, and most probably many times, right?
>> >> Or will HBase take the middle of the region, look at the closest key,