-Re: Memory leak in HBase replication ?
Jean-Daniel Cryans 2013-07-17, 16:33
Those puts should get cleared right away, so it could mean that they
live in memory... which usually points to very full IPC queues. If you
jstack those region servers, are all the handlers thread full? What is
the log before it starts doing full GCs? Can we see it?
On Wed, Jul 17, 2013 at 9:06 AM, Anusauskas, Laimonas
<[EMAIL PROTECTED]> wrote:
> I am fairly new to Hbase. We are trying to setup OpenTSDB system here and just started setting up production clusters. We have 2 datacenters, on a west/east coasts and we want to have 2 active-passive Hbase clusters with Hbase replication between them. Right now each cluster has 4 nodes (1 master, 3 slave), we will add more nodes as the load ramps up. Setup went fine and data started getting replicating from one cluster to another, but as soon as load picked up regionservers on slave cluster started running out of heap and getting killed. I increased heap size on regionservers from default 1000M to 2000M, but result was the same. I also updated Hbase from the version that came with Hortonworks (hbase-0.94.6.1.3.0.0-107-security) to hbase-0.94.9 - still the same.
> Now the load on source cluster is still very little. There is one active table - tsdb, and compressed size is less than 200M. But as soon as I start replication the usedHeapMB metric on regionservers in slave cluster starts going up, then full GC kicks in and eventually process is killed because "-XX:OnOutOfMemoryError=kill -9 %p" is set.
> I did the heap dump and ran Eclipse memory analyzer and here is what it reported:
> One instance of "java.util.concurrent.LinkedBlockingQueue" loaded by "<system class loader>" occupies 1,411,643,656 (67.87%) bytes. The instance is referenced by org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server @ 0x7831c37f0 , loaded by "sun.misc.Launcher$AppClassLoader @ 0x783130980". The memory is accumulated in one instance of "java.util.concurrent.LinkedBlockingQueue$Node" loaded by "<system class loader>".
> 502,763 instances of "org.apache.hadoop.hbase.client.Put", loaded by "sun.misc.Launcher$AppClassLoader @ 0x783130980" occupy 244,957,616 (11.78%) bytes.
> There is nothing in the logs until full GC kicks in at which point all hell breaks loose, things start timing out etc.
> I did bunch of searching but came up with nothing. I could add more RAM to the nodes and increase heap size, but I suspect that will only prolong the time until heap gets full.
> Any help would be appreciated.