|
|
+
Oleg Ruchovets 2012-09-10, 08:19
+
Harsh J 2012-09-10, 08:24
+
Oleg Ruchovets 2012-09-10, 08:45
+
Harsh J 2012-09-10, 17:22
-
Re: bulk loading regions numberMarcos Ortiz 2012-09-10, 13:17
Well, the defaul value for a region is 256 MB, so, if you want to
storage a lot of date, you should want to consider to increase that value. With the preSplit() method, you can control how to do this process. On 09/10/2012 04:45 AM, Oleg Ruchovets wrote: > Great > That is actually what I am thinking about too. > What is the best practice to choose HFile size? > What is the penalty to define it very big? > > Thanks > Oleg. > > On Mon, Sep 10, 2012 at 4:24 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Hi Oleg, >> >> If the root issue is a growing number of regions, why not control that >> instead of a way to control the Reducer count? You could, for example, >> raise the split-point sizes for HFiles, to not have it split too much, >> and hence have larger but fewer regions? >> >> Given that you have 10 machines, I'd go this way rather than ending up >> with a lot of regions causing issues with load. >> >> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets <[EMAIL PROTECTED]> >> wrote: >>> Hi , >>> I am using bulk loading to write my data to hbase. >>> >>> I works fine , but number of regions growing very rapidly. >>> Entering ONE WEEK of data I got 200 regions (I am going to save years of >>> data). >>> As a result job which writes data to HBase has REDUCERS number equals >>> REGIONS number. >>> So entering only one WEEK of data I have 200 reducers. >>> >>> Questions: >>> How to resolve the problem of constantly growing reducers number using >>> bulk loading and TotalOrderPartition. >>> I have 10 machine cluster and I think I should have ~ 30 reducers. >>> >>> Thank in advance. >>> Oleg. >> >> >> -- >> Harsh J >> > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ort�z Valmaseda *Data Engineer && Sr. System Administrator at UCI* about.me/marcosortiz <http://about.me/marcosortiz> My Blog <http://marcosluis2186.posterous.com> Tumblr's blog <http://marcosortiz.tumblr.com/> @marcosluis2186 <http://twitter.com/marcosluis2186> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci |