Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Struggling with Region Servers Running out of Memory


Copy link to this message
-
Re: Struggling with Region Servers Running out of Memory
Are you writing fat cells?

Did you try raising the heap size? and see if still it is crashing?

Regards
Ram

On Wed, Oct 31, 2012 at 6:10 AM, Jeff Whiting <[EMAIL PROTECTED]> wrote:

> So I'm looking at ganglia so the numbers are somewhat approximate (this is
> for a server that just crashed about an 1/2 hour ago due to running out of
> memory):
>
> Store files are hovering just below 1k.  Over the last 24 hours it has
> varied by about 100 files (I'm looking at hbase.regionserver.storefiles)**
> .
>
> Block cache count is about 24k varied by about 2k.  Our block cache free
> goes between 0.7G and 0.4G.  It looks like we have almost 3G free after
> restarting a region server.
>
> The evicted block count went from 210k to 320k over a 24 hour period.  Hit
> ratio is close to 100 (the graph isn't very detailed so I'm guess it is
> like 98-99%).
>
> Block cache size stays at about 2GB.
>
> ~Jeff
>
>
>
> On 10/30/2012 6:21 PM, Jeff Whiting wrote:
>
>> We have no coprossesors.  We are running replication from this cluster to
>> another one.
>>
>> What is the best way to see how many store files we have? Or checking on
>> the block cache?
>>
>> ~Jeff
>>
>> On 10/30/2012 12:43 AM, ramkrishna vasudevan wrote:
>>
>>> Hi
>>>
>>> Are you using any coprocessors? Can you see how many store files are
>>> created?
>>>
>>> The no of blocks getting cached will give you an idea too..
>>>
>>> Regards
>>> Ram
>>>
>>> On Tue, Oct 30, 2012 at 4:25 AM, Jeff Whiting <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>  We have 6 region server given 10G of memory for hbase.  Each region
>>>> server
>>>> has an average of about 100 regions and across the cluster we are
>>>> averaging
>>>> about 100 requests / second with a pretty even read / write load.  We
>>>> are
>>>> running cdh4 (0.92.1-cdh4.0.1, rUnknown)
>>>>
>>>> I feel that looking over our load and our requests that the 10GB of
>>>> memory
>>>> should be enough to handle the load and that we shouldn't really be
>>>> pushing
>>>> the the memory limits.
>>>>
>>>> However what we are seeing is that our memory usage goes up slowly until
>>>> the region server starts sputtering due to gc collection issues and it
>>>> will
>>>> eventually get timed out by zookeeper and be killed.
>>>>
>>>> We'll see aborts like this in the log:
>>>> 2012-10-29 08:10:52,132 FATAL org.apache.hadoop.hbase.****
>>>> regionserver.HRegionServer:
>>>> ABORTING region server ds5.h1.ut1.qprod.net,60020,****1351233245547:
>>>> Unhandled exception: org.apache.hadoop.hbase.****YouAreDeadException:
>>>> Server REPORT rejected; currently processing ds5.h1.ut1.qprod.net
>>>> ,60020,****1351233245547
>>>> as dead server
>>>> 2012-10-29 08:10:52,250 FATAL org.apache.hadoop.hbase.****
>>>> regionserver.HRegionServer:
>>>> RegionServer abort: loaded coprocessors are: []
>>>> 2012-10-29 08:10:52,392 FATAL org.apache.hadoop.hbase.****
>>>> regionserver.HRegionServer:
>>>> ABORTING region server ds5.h1.ut1.qprod.net,60020,****1351233245547:
>>>> regionserver:60020-****0x13959edd45934cf-****0x13959edd45934cf-**
>>>> 0x13959edd45934cf-****0x13959edd45934cf-****0x13959edd45934cf
>>>> regionserver:60020-****0x13959edd45934cf-****0x13959edd45934cf-**
>>>> 0x13959edd45934cf-****0x13959edd45934cf-****0x13959edd45934cf received
>>>> expired from ZooKeeper, aborting
>>>> 2012-10-29 08:10:52,401 FATAL org.apache.hadoop.hbase.****
>>>> regionserver.HRegionServer:
>>>> RegionServer abort: loaded coprocessors are: []
>>>>
>>>> Which are "caused" by:
>>>> 2012-10-29 08:07:40,646 WARN org.apache.hadoop.hbase.util.****Sleeper:
>>>> We
>>>> slept 29014ms instead of 3000ms, this is likely due to a long garbage
>>>> collecting pause and it's usually bad, see
>>>> http://hbase.apache.org/book.**** <http://hbase.apache.org/book.**>
>>>> html#trouble.rs.runtime.****zkexpired<http://hbase.apache.**
>>>> org/book.html#trouble.rs.**runtime.zkexpired<http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired>
>>>> >
>>>> 2012-10-29 08:08:39,074 WARN org.apache.hadoop.hbase.util.****Sleeper: