Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading (and/or major compaction) causing OOM


Copy link to this message
-
Re: Bulk loading (and/or major compaction) causing OOM
Merging is not an option for us, because we cannot afford to bring our
cluster down.  Also, we are not yet convinced that our cluster can handle
such large regions due to all the OOM issues we are seeing when trying to
bring new, bigger regions online.
On Sat, Dec 8, 2012 at 3:42 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:

>
> On 12/08/2012 11:50 AM, Bryan Beaudreault wrote:
>
> Thanks for the responses guys.  Responses inline
>
>
>  When you are doing the bulk load, are you pre-split your regions?
> What OS are you using and what version of Java?
>
>  Yes, regions are pre-split.  We calculated them using M/R before attempting
> to bulk load the data.  We've done this before with smaller sizes and it
> has worked fine.
>
> Centos5, java 1.6.0_27
>
>
>  Yes, my friend. You should know all the benefits in the new stable
>
>  release (0.94.3), so
>
>  this is the first advice.
>
>  We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x
> branch.
>
>  Great to hear.
>
>
> On Fri, Dec 7, 2012 at 4:48 PM, Stack <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote:
>
>
>  On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>wrote:
>
>
>  We have a couple tables that had thousands of regions due to the size of
> the day in them.  We recently changed them to have larger regions (nearly
> 4GB).  We are trying to bulk load these in now, but every time we do our
> servers die with OOM.
>
>
>
>  You mean, you are reloading the data that once was in thousands of regions
> instead into new regions of 4GB in size?
>
> I'd be surprised if the actual bulk load brings on the OOME.
>
>
>
>  That's correct.  The exact same data is currently live in an older table
> with thousands of smaller regions.  Once we get these loaded we will swap
> in the new table and delete the old.
>
>
>
>     The logs seem to show that there is always a major compaction happening
> when the OOM happens.  This is among other normal usage from a variety of
> apps in our product, so the memstores, block cache, etc are all active
> during this time.
>
>
>
>  Could you turn off major compaction during the bulk load to see if that
> helps?
>
> Automatic major compactions are actually off for our cluster, it looks
>
>  like they start doing minor compactions as data is loaded in, and that is
> where we first saw the OOM issues.  So we tried forcing major compactions
> earlier instead.
>
>
>   I was reading through the compaction code and it doesn't look like it
> should take up much memory (depending on how the Reader class works) .
>
>
>
> Yes.
>
> Are there lots of storefiles under each region?
>
> Yes actually, the bulk loaded data usually seems to contain approximately
>
>  5-10 files per region.  Likely due to the output settings of the M/R job
> that creates this data.
>
>
>
>    Does anyone with more knowledge of these internals know how it bulk load
> and major compaction works with regard to memory?
>
> We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
> version 0.90.4 (I know, I know, we're working to upgrade).
>
>
>  How much have you given hbase?
>
> If you look at your cluster monitoring, are you swapping?
>
> The regionservers are carrying how many regions per server?
>
>
>  The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of
> which 1GB goes to DN and rest to OS)
> Swapping is disabled.
> We have around 350 regions per RS currently. What we're doing now with this
> table is part of our effort to decrease the number of regions across all
> tables.  We need to do it with minimal downtime though so it is slow going.
>  We are aiming for around 200 regions per RS.
>
>  Yes, It would be nice to see less regions by servers. Have you considered
> to merge some adjacent
> regions?
>
>   St.Ack
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> http://www.uci.cuhttp://www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci