Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Bulk loading (and/or major compaction) causing OOM


Copy link to this message
-
Re: Bulk loading (and/or major compaction) causing OOM
Marcos Ortiz 2012-12-08, 20:42

On 12/08/2012 11:50 AM, Bryan Beaudreault wrote:
> Thanks for the responses guys.  Responses inline
>
>> When you are doing the bulk load, are you pre-split your regions?
>> What OS are you using and what version of Java?
> Yes, regions are pre-split.  We calculated them using M/R before attempting
> to bulk load the data.  We've done this before with smaller sizes and it
> has worked fine.
>
> Centos5, java 1.6.0_27
>
>> Yes, my friend. You should know all the benefits in the new stable
> release (0.94.3), so
>> this is the first advice.
> We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x
> branch.
Great to hear.
>
> On Fri, Dec 7, 2012 at 4:48 PM, Stack <[EMAIL PROTECTED]> wrote:
>
>> On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault
>> <[EMAIL PROTECTED]>wrote:
>>
>>> We have a couple tables that had thousands of regions due to the size of
>>> the day in them.  We recently changed them to have larger regions (nearly
>>> 4GB).  We are trying to bulk load these in now, but every time we do our
>>> servers die with OOM.
>>>
>>>
>> You mean, you are reloading the data that once was in thousands of regions
>> instead into new regions of 4GB in size?
>>
>> I'd be surprised if the actual bulk load brings on the OOME.
>>
>>
> That's correct.  The exact same data is currently live in an older table
> with thousands of smaller regions.  Once we get these loaded we will swap
> in the new table and delete the old.
>
>
>>
>>> The logs seem to show that there is always a major compaction happening
>>> when the OOM happens.  This is among other normal usage from a variety of
>>> apps in our product, so the memstores, block cache, etc are all active
>>> during this time.
>>>
>>>
>> Could you turn off major compaction during the bulk load to see if that
>> helps?
>>
>> Automatic major compactions are actually off for our cluster, it looks
> like they start doing minor compactions as data is loaded in, and that is
> where we first saw the OOM issues.  So we tried forcing major compactions
> earlier instead.
>
>>
>>> I was reading through the compaction code and it doesn't look like it
>>> should take up much memory (depending on how the Reader class works) .
>>>
>>
>> Yes.
>>
>> Are there lots of storefiles under each region?
>>
>> Yes actually, the bulk loaded data usually seems to contain approximately
> 5-10 files per region.  Likely due to the output settings of the M/R job
> that creates this data.
>
>
>>
>>>   Does anyone with more knowledge of these internals know how it bulk load
>>> and major compaction works with regard to memory?
>>>
>>> We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
>>> version 0.90.4 (I know, I know, we're working to upgrade).
>>>
>> How much have you given hbase?
>>
>> If you look at your cluster monitoring, are you swapping?
>>
>> The regionservers are carrying how many regions per server?
>>
> The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of
> which 1GB goes to DN and rest to OS)
> Swapping is disabled.
> We have around 350 regions per RS currently. What we're doing now with this
> table is part of our effort to decrease the number of regions across all
> tables.  We need to do it with minimal downtime though so it is slow going.
>   We are aiming for around 200 regions per RS.
Yes, It would be nice to see less regions by servers. Have you
considered to merge some adjacent
regions?

>
>> St.Ack
>>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

--

Marcos Luis Ort�z Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci