-Re: bulk loading regions number
Marcos Ortiz 2012-09-10, 13:17
Well, the defaul value for a region is 256 MB, so, if you want to
storage a lot of date, you should want to consider to
increase that value.
With the preSplit() method, you can control how to do this process.
On 09/10/2012 04:45 AM, Oleg Ruchovets wrote:
> That is actually what I am thinking about too.
> What is the best practice to choose HFile size?
> What is the penalty to define it very big?
> On Mon, Sep 10, 2012 at 4:24 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Hi Oleg,
>> If the root issue is a growing number of regions, why not control that
>> instead of a way to control the Reducer count? You could, for example,
>> raise the split-point sizes for HFiles, to not have it split too much,
>> and hence have larger but fewer regions?
>> Given that you have 10 machines, I'd go this way rather than ending up
>> with a lot of regions causing issues with load.
>> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets <[EMAIL PROTECTED]>
>>> Hi ,
>>> I am using bulk loading to write my data to hbase.
>>> I works fine , but number of regions growing very rapidly.
>>> Entering ONE WEEK of data I got 200 regions (I am going to save years of
>>> As a result job which writes data to HBase has REDUCERS number equals
>>> REGIONS number.
>>> So entering only one WEEK of data I have 200 reducers.
>>> How to resolve the problem of constantly growing reducers number using
>>> bulk loading and TotalOrderPartition.
>>> I have 10 machine cluster and I think I should have ~ 30 reducers.
>>> Thank in advance.
>> Harsh J
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
Marcos Luis Ortï¿½z Valmaseda
*Data Engineer && Sr. System Administrator at UCI*
My Blog <http://marcosluis2186.posterous.com>
Tumblr's blog <http://marcosortiz.tumblr.com/>
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION