Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> bulk loading regions number


+
Oleg Ruchovets 2012-09-10, 08:19
+
Harsh J 2012-09-10, 08:24
+
Oleg Ruchovets 2012-09-10, 08:45
+
Harsh J 2012-09-10, 17:22
Copy link to this message
-
Re: bulk loading regions number
Well, the defaul value for a region is 256 MB, so, if you want to
storage a lot of date, you should want to consider to
increase that value.
With the preSplit() method, you can control how to do this process.

On 09/10/2012 04:45 AM, Oleg Ruchovets wrote:
> Great
>    That is actually what I am thinking about too.
> What is the best practice to choose HFile size?
> What is the penalty to define it very big?
>
> Thanks
> Oleg.
>
> On Mon, Sep 10, 2012 at 4:24 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Hi Oleg,
>>
>> If the root issue is a growing number of regions, why not control that
>> instead of a way to control the Reducer count? You could, for example,
>> raise the split-point sizes for HFiles, to not have it split too much,
>> and hence have larger but fewer regions?
>>
>> Given that you have 10 machines, I'd go this way rather than ending up
>> with a lot of regions causing issues with load.
>>
>> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets <[EMAIL PROTECTED]>
>> wrote:
>>> Hi ,
>>>    I am using bulk loading to write my data to hbase.
>>>
>>> I works fine , but number of regions growing very rapidly.
>>> Entering ONE WEEK of data I got  200 regions (I am going to save years of
>>> data).
>>> As a result job which writes data to HBase has REDUCERS number equals
>>> REGIONS number.
>>> So entering only one WEEK of data I have 200 reducers.
>>>
>>> Questions:
>>>     How to resolve the problem of constantly growing reducers number using
>>> bulk loading and TotalOrderPartition.
>>>   I have 10 machine cluster and I think I should have ~ 30 reducers.
>>>
>>> Thank in advance.
>>> Oleg.
>>
>>
>> --
>> Harsh J
>>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

--

Marcos Luis Ort�z Valmaseda
*Data Engineer && Sr. System Administrator at UCI*
about.me/marcosortiz <http://about.me/marcosortiz>
My Blog <http://marcosluis2186.posterous.com>
Tumblr's blog <http://marcosortiz.tumblr.com/>
@marcosluis2186 <http://twitter.com/marcosluis2186>

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci