It starts with smaller regions and then ramps up the size as the table in question is growing in size.
Makes make for a more even distribution of regions.
Into how many region did you presplit your table? How many region servers do you have available in this cluster?
Maybe that number was less than what that policy estimated to be a good number of regions for your cluster.
----- Original Message -----
From: Vladimir Rodionov <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Sunday, July 28, 2013 10:25 PM
Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
Thanks, Ted. Are there any rationals behind IncreasingToUpperBoundRegionSplitPolicy ?
Why is it better than ConstantSizeRegionSplitPolicy?
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]
From: Ted Yu [[EMAIL PROTECTED]]
Sent: Sunday, July 28, 2013 8:52 PM
To: [EMAIL PROTECTED]
Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
In 0.94 there're several split policies available:
* @see IncreasingToUpperBoundRegionSplitPolicy Default split policy since
* @see ConstantSizeRegionSplitPolicy Default split policy before 0.94.0
On Sun, Jul 28, 2013 at 8:39 PM, Vladimir Rodionov
> Yes, I pre-split the table
> Out of 109 regions only 3 are empty (wrong assumption on key distribution),
> If we split region on 10GB we will have > 500GB in 109 regions - not 376GB
> I do not understand how does region splitting algorithm work.
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
> From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
> Sent: Sunday, July 28, 2013 5:50 PM
> To: [EMAIL PROTECTED]
> Subject: Re: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> "Yes it works, of course." It's not working for me ;) so was not sure.
> It's normal to have regions under the half of the MAX_FILESIZE. When a
> regions is more than MAX_FILESIZE it's splitted in 2. So one can be more,
> and the other one can be less.
> I will say, average 5GB will have been a good value, but even 3.6 is still
> not so bad.
> Have you pre-splitted the regions initially? Is it possible that you have
> not-used pre-splitted regions?
> You can you Hannibal to have a quick view of what the sizes are
> 2013/7/28 Vladimir Rodionov <[EMAIL PROTECTED]>
> > The final stats:
> > Total HDFS size - 376GB
> > #regions: 109 - avg. region size ~ 3.6GB
> > Something is wrong here. I expected fewer regions. The regions get split
> > at sizes much lower than
> > hbase.hregion.max.filesize and/or MAX_FILESIZE.
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: [EMAIL PROTECTED]
> > ________________________________________
> > From: Vladimir Rodionov
> > Sent: Sunday, July 28, 2013 3:35 PM
> > To: [EMAIL PROTECTED]
> > Subject: RE: MAX_FILESIZE and hbase.hregion.max.filesize are both 10Gb
> > Yes it works, of course.
> > Its in original post - ~ 10gB
> > <property>
> > <name>hbase.hregion.max.filesize</name>
> > <value>10737418240</value>
> > <source>hbase-site.xml</source>
Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.