|
|
-
Re: ZooKeeper's resident set size grows but does not shrinkHenry Robinson 2012-05-29, 20:04
Hi Brian -
(Copying the list as well for general interest) I dug into this a bit this weekend. The heap dump does show that heap usage is unexpectedly high, and that ZK is using more memory than you might think it should. The root cause is that each server maintains a 'committed log' of 500 proposals in memory. This is to speed up the case where another server is trying to catch up, and is only behind by < 500 proposals - the up-to-date server can send them directly. So each of the last 500 proposals, and their associated data, are kept in memory. For the experiments I ran, I created and then deleted a znode 1000 times with 250k of data. So 50% of the last 500 transactions were 'large'. As a result, I expected to see ~66MB of extra data in the java heap. What I actually saw was ~192MB taken up by byte arrays. Some digging into the heap showed that the data in the commit log is actually copied into *three* different places. This doesn't fully explain the 1.5G byte[] usage that you're seeing. It might be worth forcing a full GC from jvisualvm or similar and seeing if anything gets cleaned up. Another way to test my hypothesis is to 'flush' the commit log with 500 small transactions - repeatedly setting a znode's data to "", for example - this should free up the commit log and you should see heap usage drop significantly. Of course, the RSS will still remain high, for reasons discussed earlier. I'd love to see the results if you still have the machines available to try these two things on. I've filed https://issues.apache.org/jira/browse/ZOOKEEPER-1473 to track the triple-memory issue. The maximum exposure for a single instance is an extra 1G in heap - not good, but not disastrous and it only shows up under a particular workload. Still, would be good to get it fixed. Thanks, Henry On 23 May 2012 18:14, Brian Oki <[EMAIL PROTECTED]> wrote: > Henry, > > Thanks for the quick reply. Here's the output of jmap -heap: > > using thread-local object allocation. > Parallel GC with 4 thread(s) > > Heap Configuration: > MinHeapFreeRatio = 40 > MaxHeapFreeRatio = 70 > MaxHeapSize = 3221225472 (3072.0MB) > NewSize = 1310720 (1.25MB) > MaxNewSize = 17592186044415 MB > OldSize = 5439488 (5.1875MB) > NewRatio = 2 > SurvivorRatio = 8 > PermSize = 21757952 (20.75MB) > MaxPermSize = 174063616 (166.0MB) > > Heap Usage: > PS Young Generation > Eden Space: > capacity = 641073152 (611.375MB) > used = 367863672 (350.82213592529297MB) > free = 273209480 (260.55286407470703MB) > 57.38247980785818% used > From Space: > capacity = 212992000 (203.125MB) > used = 188128576 (179.41339111328125MB) > free = 24863424 (23.71160888671875MB) > 88.32659254807692% used > To Space: > capacity = 216334336 (206.3125MB) > used = 0 (0.0MB) > free = 216334336 (206.3125MB) > 0.0% used > PS Old Generation > capacity = 1739915264 (1659.3125MB) > used = 1083663768 (1033.462303161621MB) > free = 656251496 (625.8501968383789MB) > 62.282559985633874% used > PS Perm Generation > capacity = 21757952 (20.75MB) > used = 9952064 (9.49102783203125MB) > free = 11805888 (11.25897216796875MB) > 45.73989316641566% used > > The histogram shows the following (a portion), if that helps. You can see > that there's ~1.5 GB of byte[] in the heap of stuff we're uncertain about. > I didn't bother to dump the heap in binary format. All of the znodes and > data created by the test have been deleted. > > > Object Histogram: > > num #instances #bytes Class description > -------------------------------------------------------------------------- > 1: 112235 1546244888 byte[] > 2: 20804 69853352 int[] > 3: 40920 4718272 char[] > 4: 14659 2296304 * ConstMethodKlass > 5: 39018 1872864 java.nio.HeapByteBuffer > 6: 14659 1767608 * MethodKlass Henry Robinson Software Engineer Cloudera 415-994-6679 |