|
Ramkrishna S Vasudevan
2011-12-01, 02:51
Vladimir Rodionov
2011-12-01, 06:22
Stack
2011-12-01, 19:26
Kihwal Lee
2011-12-01, 20:20
bijieshan
2011-12-02, 07:37
Ted Yu
2011-12-05, 18:49
Stack
2011-12-05, 20:03
lars hofhansl
2011-12-05, 20:05
Shrijeet Paliwal
2011-12-05, 20:08
Stack
2011-12-05, 20:18
Stack
2011-12-05, 20:19
|
-
RE: Suspected memory leakRamkrishna S Vasudevan 2011-12-01, 02:51
Adding dev list to get some suggestions. Regards Ram -----Original Message----- From: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 01, 2011 8:08 AM To: [EMAIL PROTECTED] Cc: Gaojinchao; Chenjian Subject: Re: Suspected memory leak Jieshan, We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3 -Shrijeet 2011/11/30 bijieshan <[EMAIL PROTECTED]> > Hi Shrijeet, > > I think that's jira relevant to trunk, but not for 90.X. For there's no > timeout mechanism in 90.X. Right? > We found this problem in 90.x. > > Thanks, > > Jieshan. > > -----邮件原件----- > 发件人: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]] > 发送时间: 2011年12月1日 10:26 > 收件人: [EMAIL PROTECTED] > 抄送: Gaojinchao; Chenjian > 主题: Re: Suspected memory leak > > Gaojinchao, > > I had filed this some time ago, > https://issues.apache.org/jira/browse/HBASE-4633 > But after some recent insights on our application code, I am inclined to > think leak (or memory 'hold') is in our application. But it will be good to > check out either way. > I need to update the jira with my saga. See if the description of issue I > posted there, matches yours. If not, may be you can update with your story > in detail. > > -Shrijeet > > 2011/11/30 Gaojinchao <[EMAIL PROTECTED]> > > > I have noticed some memory leak problems in my HBase client. > > RES has increased to 27g > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76 > > /opt/java/jre/bin/java -Djava.library.path=lib/. > > > > But I am not sure the leak comes from HBase Client jar itself or just our > > client code. > > > > This is some parameters of jvm. > > :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC > > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65 > > -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1 > > -XX:+CMSParallelRemarkEnabled > > > > Who has experience in this case? , I need continue to dig :) > > > > > > > > 发件人: Gaojinchao > > 发送时间: 2011年11月30日 11:02 > > 收件人: [EMAIL PROTECTED] > > 主题: Suspected memory leak > > > > In HBaseClient proceess, I found heap has been increased. > > I used command ’cat smaps’ to get the heap size. > > It seems in case when the threads pool in HTable has released the no > using > > thread, if you use putlist api to put data again, the memory is > increased. > > > > Who has experience in this case? > > > > Below is the heap of Hbase client: > > C3S31:/proc/18769 # cat smaps > > 4010a000-4709d000 rwxp 00000000 00:00 0 > > [heap] > > Size: 114252 kB > > Rss: 114044 kB > > Pss: 114044 kB > > > > 4010a000-4709d000 rwxp 00000000 00:00 0 > > [heap] > > Size: 114252 kB > > Rss: 114044 kB > > Pss: 114044 kB > > > > 4010a000-48374000 rwxp 00000000 00:00 0 > > [heap] > > Size: 133544 kB > > Rss: 133336 kB > > Pss: 133336 kB > > > > 4010a000-49f20000 rwxp 00000000 00:00 0 > > [heap] > > Size: 161880 kB > > Rss: 161672 kB > > Pss: 161672 kB > > > > 4010a000-4c5de000 rwxp 00000000 00:00 0 > > [heap] > > Size: 201552 kB > > Rss: 201344 kB > > Pss: 201344 kB > > >
-
RE: Suspected memory leakVladimir Rodionov 2011-12-01, 06:22
You can create several heap dumps of JVM process in question and compare heap allocations
To create heap dump: jmap pid To analize: 1. jhat 2. visualvm 3. any commercial profiler One note: -Xmn12G ??? How long is your minor collections GC pauses? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: Ramkrishna S Vasudevan [[EMAIL PROTECTED]] Sent: Wednesday, November 30, 2011 6:51 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: Suspected memory leak Adding dev list to get some suggestions. Regards Ram -----Original Message----- From: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 01, 2011 8:08 AM To: [EMAIL PROTECTED] Cc: Gaojinchao; Chenjian Subject: Re: Suspected memory leak Jieshan, We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3 -Shrijeet 2011/11/30 bijieshan <[EMAIL PROTECTED]> > Hi Shrijeet, > > I think that's jira relevant to trunk, but not for 90.X. For there's no > timeout mechanism in 90.X. Right? > We found this problem in 90.x. > > Thanks, > > Jieshan. > > -----邮件原件----- > 发件人: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]] > 发送时间: 2011年12月1日 10:26 > 收件人: [EMAIL PROTECTED] > 抄送: Gaojinchao; Chenjian > 主题: Re: Suspected memory leak > > Gaojinchao, > > I had filed this some time ago, > https://issues.apache.org/jira/browse/HBASE-4633 > But after some recent insights on our application code, I am inclined to > think leak (or memory 'hold') is in our application. But it will be good to > check out either way. > I need to update the jira with my saga. See if the description of issue I > posted there, matches yours. If not, may be you can update with your story > in detail. > > -Shrijeet > > 2011/11/30 Gaojinchao <[EMAIL PROTECTED]> > > > I have noticed some memory leak problems in my HBase client. > > RES has increased to 27g > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 12676 root 20 0 30.8g 27g 5092 S 2 57.5 587:57.76 > > /opt/java/jre/bin/java -Djava.library.path=lib/. > > > > But I am not sure the leak comes from HBase Client jar itself or just our > > client code. > > > > This is some parameters of jvm. > > :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC > > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65 > > -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1 > > -XX:+CMSParallelRemarkEnabled > > > > Who has experience in this case? , I need continue to dig :) > > > > > > > > 发件人: Gaojinchao > > 发送时间: 2011年11月30日 11:02 > > 收件人: [EMAIL PROTECTED] > > 主题: Suspected memory leak > > > > In HBaseClient proceess, I found heap has been increased. > > I used command ’cat smaps’ to get the heap size. > > It seems in case when the threads pool in HTable has released the no > using > > thread, if you use putlist api to put data again, the memory is > increased. > > > > Who has experience in this case? > > > > Below is the heap of Hbase client: > > C3S31:/proc/18769 # cat smaps > > 4010a000-4709d000 rwxp 00000000 00:00 0 > > [heap] > > Size: 114252 kB > > Rss: 114044 kB > > Pss: 114044 kB > > > > 4010a000-4709d000 rwxp 00000000 00:00 0 > > [heap] > > Size: 114252 kB > > Rss: 114044 kB > > Pss: 114044 kB > > > > 4010a000-48374000 rwxp 00000000 00:00 0 > > [heap] > > Size: 133544 kB > > Rss: 133336 kB > > Pss: 133336 kB > > > > 4010a000-49f20000 rwxp 00000000 00:00 0 > > [heap] > > Size: 161880 kB > > Rss: 161672 kB > > Pss: 161672 kB > > > > 4010a000-4c5de000 rwxp 00000000 00:00 0 > > [heap] > > Size: 201552 kB > > Rss: 201344 kB > > Pss: 201344 kB > > > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.
-
Re: Suspected memory leakStack 2011-12-01, 19:26
Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357# St.Ack
-
Re: Suspected memory leakKihwal Lee 2011-12-01, 20:20
Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them. Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details. Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here. Kihwal On 12/1/11 1:26 PM, "Stack" <[EMAIL PROTECTED]> wrote: Make sure its not the issue that Jonathan Payne identifiied a while back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357# St.Ack
-
Re: Suspected memory leakbijieshan 2011-12-02, 07:37
Thank you all.
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. And we have known the content of the problem memory section, all the records contains the info like below: "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||" "BBZHtable_UFDR_058,048342220093168-02570" ........ Jieshan. -----邮件原件----- 发件人: Kihwal Lee [mailto:[EMAIL PROTECTED]] 发送时间: 2011年12月2日 4:20 收件人: [EMAIL PROTECTED] 抄送: Ramakrishna s vasudevan; [EMAIL PROTECTED] 主题: Re: Suspected memory leak Adding to the excellent write-up by Jonathan: Since finalizer is involved, it takes two GC cycles to collect them. Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details. Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here. Kihwal On 12/1/11 1:26 PM, "Stack" <[EMAIL PROTECTED]> wrote: Make sure its not the issue that Jonathan Payne identifiied a while back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357# St.Ack
-
Re: Suspected memory leakTed Yu 2011-12-05, 18:49
Lars:
What you proposed below should be close to what netty does. Instead of managing the complexity of NIO related code, we can delegate to netty as what asynchbase does. This discussion should be under a different thread / JIRA. sendParam() is called by HBaseClient.call() which is called by WritableRpcEngine and SecureRpcEngine. Can you elaborate on what you think the call hierarchy should be ? Overall, I think we can resolve HBASE-4633 and put further discussion under https://issues.apache.org/jira/browse/HBASE-4956 Cheers On Sun, Dec 4, 2011 at 10:08 PM, Lars <[EMAIL PROTECTED]> wrote: > To Ted... yes sorry sendParam. > > Any better solution involves changing the code. > > I could envision a form of active object where all NIO is handled by a > small pool of threads and/or doing chunking into (say) 8k chunks on the > client. Or both. > > In both cases there would less direct buffer garbage produced by the > client. > > Why is sendParam called directly by the client (app) threads? Is it to > enforce ordering? > > Lastly, XX:MaxDirectMemorySize should definitely be documented. > > -- Lars > > Gaojinchao <[EMAIL PROTECTED]> schrieb: > > >Ok. Anyone has better solution?. Do we need to introduce in book? > > > > > >-----邮件原件----- > >发件人: Ted Yu [mailto:[EMAIL PROTECTED]] > >发送时间: 2011年12月5日 11:39 > >收件人: [EMAIL PROTECTED] > >主题: Re: FeedbackRe: Suspected memory leak > > > >Jinchao: > >Since we found the workaround, can you summarize the following statistics > >on HBASE-4633 ? > > > >Thanks > > > >2011/12/4 Gaojinchao <[EMAIL PROTECTED]> > > > >> Yes, I have tested, System is fine. > >> Nearly one hours , trigger a full GC. > >> 10022.210: [Full GC (System) 10022.210: [Tenured: > >> 577566K->257349K(1048576K), 1.7515610 secs] > 9651924K->257349K(14260672K), > >> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75 > >> sys=0.00, real=1.75 secs] > >> ......... > >> > >> ......... > >> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K), > >> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times: > >> user=1.90 sys=0.01, real=0.14 secs] > >> 13624.630: [Full GC (System) 13624.630: [Tenured: > >> 310202K->175378K(1048576K), 1.9529280 secs] > 11581276K->175378K(14260672K), > >> [Perm : 19225K->19225K(65536K)], 1.9531660 secs] > >> [Times: user=1.94 sys=0.00, real=1.96 secs] > >> > >> 7543 root 20 0 17.0g 15g 9892 S 0 32.9 1184:34 java > >> 7543 root 20 0 17.0g 15g 9892 S 1 32.9 1184:34 java > >> > >> -----邮件原件----- > >> 发件人: Ted Yu [mailto:[EMAIL PROTECTED]] > >> 发送时间: 2011年12月5日 9:06 > >> 收件人: [EMAIL PROTECTED] > >> 主题: Re: FeedbackRe: Suspected memory leak > >> > >> Can you try specifying XX:MaxDirectMemorySize with moderate value and > see > >> if the leak gets under control ? > >> > >> Thanks > >> > >> 2011/12/4 Gaojinchao <[EMAIL PROTECTED]> > >> > >> > I have attached the stack in > >> > https://issues.apache.org/jira/browse/HBASE-4633. > >> > I will update our story. > >> > > >> > > >> > -----邮件原件----- > >> > 发件人: Ted Yu [mailto:[EMAIL PROTECTED]] > >> > 发送时间: 2011年12月5日 7:37 > >> > 收件人: [EMAIL PROTECTED]; lars hofhansl > >> > 主题: Re: FeedbackRe: Suspected memory leak > >> > > >> > I looked through TRUNK and 0.90 code but didn't find > >> > HBaseClient.Connection.setParam(). > >> > The method should be sendParam(). > >> > > >> > When I was in China I tried to access Jonathan's post but wasn't able > to. > >> > > >> > If Jinchao's stack trace resonates with the one Jonathan posted, we > >> should > >> > consider using netty for HBaseClient. > >> > > >> > Cheers > >> > > >> > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <[EMAIL PROTECTED]> > >> wrote: > >> > > >> > > I think HBASE-4508 is unrelated. > >> > > The "connections" I referring to are HBaseClient.Connection objects > >> (not > >> > > HConnections). > >> > > It turns out that HBaseClient.Connection.setParam is actually called > >> > > directly by the client threads, which means we can get
-
Re: Suspected memory leakStack 2011-12-05, 20:03
2011/12/5 Ted Yu <[EMAIL PROTECTED]>:
> Lars: > What you proposed below should be close to what netty does. > Instead of managing the complexity of NIO related code, we can delegate to > netty as what asynchbase does. > This discussion should be under a different thread / JIRA. > > sendParam() is called by HBaseClient.call() which is called by > WritableRpcEngine and SecureRpcEngine. > Can you elaborate on what you think the call hierarchy should be ? > > Overall, I think we can resolve HBASE-4633 and put further discussion under > https://issues.apache.org/jira/browse/HBASE-4956 > We need the workaround though, don't we Ted? We could commit the workaround as part of hbase-4633? (Thanks for opening hbase-4956) St.Ack
-
Re: Suspected memory leaklars hofhansl 2011-12-05, 20:05
netty is a good option, will probably lead to a lot of refactoring.
sendParam is called in the context of the client thread. All further communication is enqueued. See for example HBaseClient.call(...) The call is enqueued (in getConnection) Then sendParam is called directly (which means it is called by the calling application thread). Then the calling waits for completion of the call, which is now handled by the connection thread. What I am wondering is why the we don't call sendParam as part of the queued operation. That would ensure that only a known set of threads is performing NIO using direct buffers. There must be a good reason for this, and I think it is to ensure the order of operations, but I am not entirely sure. -- Lars ----- Original Message ----- From: Ted Yu <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Monday, December 5, 2011 10:49 AM Subject: Re: Suspected memory leak Lars: What you proposed below should be close to what netty does. Instead of managing the complexity of NIO related code, we can delegate to netty as what asynchbase does. This discussion should be under a different thread / JIRA. sendParam() is called by HBaseClient.call() which is called by WritableRpcEngine and SecureRpcEngine. Can you elaborate on what you think the call hierarchy should be ? Overall, I think we can resolve HBASE-4633 and put further discussion under https://issues.apache.org/jira/browse/HBASE-4956 Cheers On Sun, Dec 4, 2011 at 10:08 PM, Lars <[EMAIL PROTECTED]> wrote: > To Ted... yes sorry sendParam. > > Any better solution involves changing the code. > > I could envision a form of active object where all NIO is handled by a > small pool of threads and/or doing chunking into (say) 8k chunks on the > client. Or both. > > In both cases there would less direct buffer garbage produced by the > client. > > Why is sendParam called directly by the client (app) threads? Is it to > enforce ordering? > > Lastly, XX:MaxDirectMemorySize should definitely be documented. > > -- Lars > > Gaojinchao <[EMAIL PROTECTED]> schrieb: > > >Ok. Anyone has better solution?. Do we need to introduce in book? > > > > > >-----邮件原件----- > >发件人: Ted Yu [mailto:[EMAIL PROTECTED]] > >发送时间: 2011年12月5日 11:39 > >收件人: [EMAIL PROTECTED] > >主题: Re: FeedbackRe: Suspected memory leak > > > >Jinchao: > >Since we found the workaround, can you summarize the following statistics > >on HBASE-4633 ? > > > >Thanks > > > >2011/12/4 Gaojinchao <[EMAIL PROTECTED]> > > > >> Yes, I have tested, System is fine. > >> Nearly one hours , trigger a full GC. > >> 10022.210: [Full GC (System) 10022.210: [Tenured: > >> 577566K->257349K(1048576K), 1.7515610 secs] > 9651924K->257349K(14260672K), > >> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75 > >> sys=0.00, real=1.75 secs] > >> ......... > >> > >> ......... > >> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K), > >> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times: > >> user=1.90 sys=0.01, real=0.14 secs] > >> 13624.630: [Full GC (System) 13624.630: [Tenured: > >> 310202K->175378K(1048576K), 1.9529280 secs] > 11581276K->175378K(14260672K), > >> [Perm : 19225K->19225K(65536K)], 1.9531660 secs] > >> [Times: user=1.94 sys=0.00, real=1.96 secs] > >> > >> 7543 root 20 0 17.0g 15g 9892 S 0 32.9 1184:34 java > >> 7543 root 20 0 17.0g 15g 9892 S 1 32.9 1184:34 java > >> > >> -----邮件原件----- > >> 发件人: Ted Yu [mailto:[EMAIL PROTECTED]] > >> 发送时间: 2011年12���5日 9:06 > >> 收件人: [EMAIL PROTECTED] > >> 主题: Re: FeedbackRe: Suspected memory leak > >> > >> Can you try specifying XX:MaxDirectMemorySize with moderate value and > see > >> if the leak gets under control ? > >> > >> Thanks > >> > >> 2011/12/4 Gaojinchao <[EMAIL PROTECTED]> > >> > >> > I have attached the stack in > >> > https://issues.apache.org/jira/browse/HBASE-4633. > >> > I will update our story. > >> > > >> > > >> > -----邮件原件----- > >> > 发件���: Ted Yu [mailto:[EMAIL PROTECTED]]
-
Re: Suspected memory leakShrijeet Paliwal 2011-12-05, 20:08
Stack,
4633's summary and the workaround dont go together. There is a rpc timeout related chatter (I am guilty) in 4633 which might confuse reader joining the party late. On Mon, Dec 5, 2011 at 12:03 PM, Stack <[EMAIL PROTECTED]> wrote: > 2011/12/5 Ted Yu <[EMAIL PROTECTED]>: >> Lars: >> What you proposed below should be close to what netty does. >> Instead of managing the complexity of NIO related code, we can delegate to >> netty as what asynchbase does. >> This discussion should be under a different thread / JIRA. >> >> sendParam() is called by HBaseClient.call() which is called by >> WritableRpcEngine and SecureRpcEngine. >> Can you elaborate on what you think the call hierarchy should be ? >> >> Overall, I think we can resolve HBASE-4633 and put further discussion under >> https://issues.apache.org/jira/browse/HBASE-4956 >> > > We need the workaround though, don't we Ted? We could commit the > workaround as part of hbase-4633? (Thanks for opening hbase-4956) > > St.Ack
-
Re: Suspected memory leakStack 2011-12-05, 20:18
On Mon, Dec 5, 2011 at 12:05 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> What I am wondering is why the we don't call sendParam as part of the queued operation. That would > ensure that only a known set of threads is performing NIO using direct buffers. Is it in case there is an exception? If an exception down in queuing, caller won't get it? Regards order guaranteeing, we could have the queue implementation do this? St.Ack
-
Re: Suspected memory leakStack 2011-12-05, 20:19
On Mon, Dec 5, 2011 at 12:08 PM, Shrijeet Paliwal
<[EMAIL PROTECTED]> wrote: > Stack, > 4633's summary and the workaround dont go together. > There is a rpc timeout related chatter (I am guilty) in 4633 which > might confuse reader joining the party late. > Thanks boss. Will just FAQ a client workaround. St.Ack |