Well, the defaul value for a region is 256 MB, so, if you want to
storage a lot of date, you should want to consider to
increase that value.
With the preSplit() method, you can control how to do this process.
On 09/10/2012 04:45 AM, Oleg Ruchovets wrote:
> Great
> That is actually what I am thinking about too.
> What is the best practice to choose HFile size?
> What is the penalty to define it very big?
>
> Thanks
> Oleg.
>
> On Mon, Sep 10, 2012 at 4:24 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Hi Oleg,
>>
>> If the root issue is a growing number of regions, why not control that
>> instead of a way to control the Reducer count? You could, for example,
>> raise the split-point sizes for HFiles, to not have it split too much,
>> and hence have larger but fewer regions?
>>
>> Given that you have 10 machines, I'd go this way rather than ending up
>> with a lot of regions causing issues with load.
>>
>> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets <[EMAIL PROTECTED]>
>> wrote:
>>> Hi ,
>>> I am using bulk loading to write my data to hbase.
>>>
>>> I works fine , but number of regions growing very rapidly.
>>> Entering ONE WEEK of data I got 200 regions (I am going to save years of
>>> data).
>>> As a result job which writes data to HBase has REDUCERS number equals
>>> REGIONS number.
>>> So entering only one WEEK of data I have 200 reducers.
>>>
>>> Questions:
>>> How to resolve the problem of constantly growing reducers number using
>>> bulk loading and TotalOrderPartition.
>>> I have 10 machine cluster and I think I should have ~ 30 reducers.
>>>
>>> Thank in advance.
>>> Oleg.
>>
>>
>> --
>> Harsh J
>>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
>
http://www.uci.cu>
http://www.facebook.com/universidad.uci>
http://www.flickr.com/photos/universidad_uci--
Marcos Luis Ort�z Valmaseda
*Data Engineer && Sr. System Administrator at UCI*
about.me/marcosortiz <
http://about.me/marcosortiz>My Blog <
http://marcosluis2186.posterous.com>Tumblr's blog <
http://marcosortiz.tumblr.com/>@marcosluis2186 <
http://twitter.com/marcosluis2186>10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cuhttp://www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci