Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase split policy


+
Jean-Marc Spaggiari 2013-01-22, 11:42
+
Anoop Sam John 2013-01-22, 12:24
+
ramkrishna vasudevan 2013-01-22, 13:38
+
Jean-Marc Spaggiari 2013-01-22, 13:47
+
ramkrishna vasudevan 2013-01-22, 14:02
Copy link to this message
-
Re: HBase split policy
Hi Ram,

I SPLIT_POLICY is define the same way MAX_FILESIZE is.... So I think
it's a table attribut and can be altered... That's a good news! I will
probably try it.

Also, the admin.split(rowkey) is the way I will use until I'm able to
properly use/set the SPLIT_POLICY. I will simply (try to) count the
rows in a region, and split in the middle...

Thanks for the hint regarding the SPLIT_POLICY.

JM

2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>:
>>>Also, last thing. If I want to change the default behaviour and split
>>>based on the row number instead of the midkey, can I hook somewhere?
>
> HTableDescriptor myHtd = new HTableDescriptor();
>     myHtd.setValue(HTableDescriptor.SPLIT_POLICY,
>         KeyPrefixRegionSplitPolicy.class.getName());
> So the region split policy can be changed only during table creation i
> suppose.  (May be wrong, not sure anyother way out there).
>
> When i meant split based on row key my point was like use
> admin.split(rowkey).  I will check more on your calculations and figures
> and get back to you.
>
> Regards
> Ram
>
>
> On Tue, Jan 22, 2013 at 7:17 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Anoop, Hi Ram,
>>
>> Thanks for your replies.
>>
>> I looked at the code and found in the HFileBlockIndex the midkey
>> function which is doing the computation used in the
>> Store.getSplitPoint() method.
>>
>> Now, if all the keys are almost equals in size, and the table has only
>> one big 10GB region, if we lower the maxfilesize parameter to
>> something like 300MB, we should see only almost equals regions, right?
>> It's not the result I got. So I'm trying to figure where I'm wrong.
>>
>> Also, last thing. If I want to change the default behaviour and split
>> based on the row number instead of the midkey, can I hook somewhere?
>>
>
>
>> Or will I have to disable the default split (by setting the
>> maxfilesize to something like 20GB) and run a job to split the regions
>> manually?
>>
>> Thanks,
>>
>> JM
>>
>> 2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>:
>> > Hi Jean
>> >
>> > Before replying as to what i know, region splits can be configured too.
>> >
>> > Ok, now on how the split happens
>> > -> You can explicity ask the region to get splitted on a specific row
>> key.
>> >  If you know that splitting on that rowkey will yield you almost equal
>> > region sizes.
>> > -> Now when HBase tries to split, it just takes the midkey from the
>> HFiles.
>> >  Here the midkey is the one that is the first key in the mid block of
>> > the
>> > HFile.
>> > Also the individual rows cannot be split. So if one row is nearly the
>> size
>> > of the region and other rows are smaller in size, it tries to find the
>> mid
>> > block inside the HFile and the size of one the block is going to be
>> > very
>> > huge and that may be splitted as one region.  I know this has to do
>> > with
>> > the internals of the splitting code.
>> >
>> >
>> > Regards
>> > Ram
>> >
>> > On Tue, Jan 22, 2013 at 5:12 PM, Jean-Marc Spaggiari <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm wondering, what is HBase split policy.
>> >>
>> >> I mean, let's imagine this situation.
>> >>
>> >> I have a region full of rows starting from AA to AZ. Thousands of
>> >> hundreds. I also have few rows from B to DZ. Let's say only one
>> >> hundred.
>> >>
>> >> Region is just above the maxfilesize, so it's fine.
>> >>
>> >> No, I add "A" and store a very big row into it. Almost half the size
>> >> of my maxfilesize value. That mean it's now time to split this row.
>> >>
>> >> How will HBase decide where to split it? Is it going to use the
>> >> lexical order? Which mean it will split somewhere between B and C? If
>> >> it's done that way, I will have one VERY small region, and one VERY
>> >> big which will still be over the maxfilesize and will need to be split
>> >> again, and most probably many times, right?
>> >>
>> >> Or will HBase take the middle of the region, look at the closest key,
+
Jean-Marc Spaggiari 2013-01-23, 02:39
+
Anoop Sam John 2013-01-23, 06:17
+
Jean-Marc Spaggiari 2013-01-23, 12:26
+
ramkrishna vasudevan 2013-01-23, 18:09
+
Jean-Marc Spaggiari 2013-01-23, 18:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB