Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase split policy


+
Jean-Marc Spaggiari 2013-01-22, 11:42
+
Anoop Sam John 2013-01-22, 12:24
+
ramkrishna vasudevan 2013-01-22, 13:38
+
Jean-Marc Spaggiari 2013-01-22, 13:47
+
ramkrishna vasudevan 2013-01-22, 14:02
Copy link to this message
-
Re: HBase split policy
Jean-Marc Spaggiari 2013-01-22, 14:10
Hi Ram,

I SPLIT_POLICY is define the same way MAX_FILESIZE is.... So I think
it's a table attribut and can be altered... That's a good news! I will
probably try it.

Also, the admin.split(rowkey) is the way I will use until I'm able to
properly use/set the SPLIT_POLICY. I will simply (try to) count the
rows in a region, and split in the middle...

Thanks for the hint regarding the SPLIT_POLICY.

JM

2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>:
>>>Also, last thing. If I want to change the default behaviour and split
>>>based on the row number instead of the midkey, can I hook somewhere?
>
> HTableDescriptor myHtd = new HTableDescriptor();
>     myHtd.setValue(HTableDescriptor.SPLIT_POLICY,
>         KeyPrefixRegionSplitPolicy.class.getName());
> So the region split policy can be changed only during table creation i
> suppose.  (May be wrong, not sure anyother way out there).
>
> When i meant split based on row key my point was like use
> admin.split(rowkey).  I will check more on your calculations and figures
> and get back to you.
>
> Regards
> Ram
>
>
> On Tue, Jan 22, 2013 at 7:17 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Anoop, Hi Ram,
>>
>> Thanks for your replies.
>>
>> I looked at the code and found in the HFileBlockIndex the midkey
>> function which is doing the computation used in the
>> Store.getSplitPoint() method.
>>
>> Now, if all the keys are almost equals in size, and the table has only
>> one big 10GB region, if we lower the maxfilesize parameter to
>> something like 300MB, we should see only almost equals regions, right?
>> It's not the result I got. So I'm trying to figure where I'm wrong.
>>
>> Also, last thing. If I want to change the default behaviour and split
>> based on the row number instead of the midkey, can I hook somewhere?
>>
>
>
>> Or will I have to disable the default split (by setting the
>> maxfilesize to something like 20GB) and run a job to split the regions
>> manually?
>>
>> Thanks,
>>
>> JM
>>
>> 2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>:
>> > Hi Jean
>> >
>> > Before replying as to what i know, region splits can be configured too.
>> >
>> > Ok, now on how the split happens
>> > -> You can explicity ask the region to get splitted on a specific row
>> key.
>> >  If you know that splitting on that rowkey will yield you almost equal
>> > region sizes.
>> > -> Now when HBase tries to split, it just takes the midkey from the
>> HFiles.
>> >  Here the midkey is the one that is the first key in the mid block of
>> > the
>> > HFile.
>> > Also the individual rows cannot be split. So if one row is nearly the
>> size
>> > of the region and other rows are smaller in size, it tries to find the
>> mid
>> > block inside the HFile and the size of one the block is going to be
>> > very
>> > huge and that may be splitted as one region.  I know this has to do
>> > with
>> > the internals of the splitting code.
>> >
>> >
>> > Regards
>> > Ram
>> >
>> > On Tue, Jan 22, 2013 at 5:12 PM, Jean-Marc Spaggiari <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm wondering, what is HBase split policy.
>> >>
>> >> I mean, let's imagine this situation.
>> >>
>> >> I have a region full of rows starting from AA to AZ. Thousands of
>> >> hundreds. I also have few rows from B to DZ. Let's say only one
>> >> hundred.
>> >>
>> >> Region is just above the maxfilesize, so it's fine.
>> >>
>> >> No, I add "A" and store a very big row into it. Almost half the size
>> >> of my maxfilesize value. That mean it's now time to split this row.
>> >>
>> >> How will HBase decide where to split it? Is it going to use the
>> >> lexical order? Which mean it will split somewhere between B and C? If
>> >> it's done that way, I will have one VERY small region, and one VERY
>> >> big which will still be over the maxfilesize and will need to be split
>> >> again, and most probably many times, right?
>> >>
>> >> Or will HBase take the middle of the region, look at the closest key,
+
Jean-Marc Spaggiari 2013-01-23, 02:39
+
Anoop Sam John 2013-01-23, 06:17
+
Jean-Marc Spaggiari 2013-01-23, 12:26
+
ramkrishna vasudevan 2013-01-23, 18:09
+
Jean-Marc Spaggiari 2013-01-23, 18:24