Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading (and/or major compaction) causing OOM


Copy link to this message
-
Re: Bulk loading (and/or major compaction) causing OOM
Thanks for the responses guys.  Responses inline

> When you are doing the bulk load, are you pre-split your regions?
> What OS are you using and what version of Java?

Yes, regions are pre-split.  We calculated them using M/R before attempting
to bulk load the data.  We've done this before with smaller sizes and it
has worked fine.

Centos5, java 1.6.0_27

> Yes, my friend. You should know all the benefits in the new stable
release (0.94.3), so
> this is the first advice.

We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x
branch.

On Fri, Dec 7, 2012 at 4:48 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault
> <[EMAIL PROTECTED]>wrote:
>
> > We have a couple tables that had thousands of regions due to the size of
> > the day in them.  We recently changed them to have larger regions (nearly
> > 4GB).  We are trying to bulk load these in now, but every time we do our
> > servers die with OOM.
> >
> >
> You mean, you are reloading the data that once was in thousands of regions
> instead into new regions of 4GB in size?
>
> I'd be surprised if the actual bulk load brings on the OOME.
>
>
That's correct.  The exact same data is currently live in an older table
with thousands of smaller regions.  Once we get these loaded we will swap
in the new table and delete the old.
>
>

> > The logs seem to show that there is always a major compaction happening
> > when the OOM happens.  This is among other normal usage from a variety of
> > apps in our product, so the memstores, block cache, etc are all active
> > during this time.
> >
> >
>
> Could you turn off major compaction during the bulk load to see if that
> helps?
>
> Automatic major compactions are actually off for our cluster, it looks
like they start doing minor compactions as data is loaded in, and that is
where we first saw the OOM issues.  So we tried forcing major compactions
earlier instead.

>
>
> > I was reading through the compaction code and it doesn't look like it
> > should take up much memory (depending on how the Reader class works) .
> >
>
>
> Yes.
>
> Are there lots of storefiles under each region?
>
> Yes actually, the bulk loaded data usually seems to contain approximately
5-10 files per region.  Likely due to the output settings of the M/R job
that creates this data.
>
>
> >  Does anyone with more knowledge of these internals know how it bulk load
> > and major compaction works with regard to memory?
> >
> > We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
> > version 0.90.4 (I know, I know, we're working to upgrade).
> >
>
> How much have you given hbase?
>
> If you look at your cluster monitoring, are you swapping?
>
> The regionservers are carrying how many regions per server?
>

The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of
which 1GB goes to DN and rest to OS)
Swapping is disabled.
We have around 350 regions per RS currently. What we're doing now with this
table is part of our effort to decrease the number of regions across all
tables.  We need to do it with minimal downtime though so it is slow going.
 We are aiming for around 200 regions per RS.

>
> St.Ack
>