-Re: ZooKeeper's resident set size grows but does not shrink
Henry Robinson 2012-05-29, 20:04
Hi Brian -
(Copying the list as well for general interest)
I dug into this a bit this weekend. The heap dump does show that heap usage
is unexpectedly high, and that ZK is using more memory than you might think
The root cause is that each server maintains a 'committed log' of 500
proposals in memory. This is to speed up the case where another server is
trying to catch up, and is only behind by < 500 proposals - the up-to-date
server can send them directly.
So each of the last 500 proposals, and their associated data, are kept in
memory. For the experiments I ran, I created and then deleted a znode 1000
times with 250k of data. So 50% of the last 500 transactions were 'large'.
As a result, I expected to see ~66MB of extra data in the java heap.
What I actually saw was ~192MB taken up by byte arrays. Some digging into
the heap showed that the data in the commit log is actually copied into
*three* different places.
This doesn't fully explain the 1.5G byte usage that you're seeing. It
might be worth forcing a full GC from jvisualvm or similar and seeing if
anything gets cleaned up. Another way to test my hypothesis is to 'flush'
the commit log with 500 small transactions - repeatedly setting a znode's
data to "", for example - this should free up the commit log and you should
see heap usage drop significantly. Of course, the RSS will still remain
high, for reasons discussed earlier. I'd love to see the results if you
still have the machines available to try these two things on.
I've filed https://issues.apache.org/jira/browse/ZOOKEEPER-1473 to track
the triple-memory issue. The maximum exposure for a single instance is an
extra 1G in heap - not good, but not disastrous and it only shows up under
a particular workload. Still, would be good to get it fixed.
On 23 May 2012 18:14, Brian Oki <[EMAIL PROTECTED]> wrote:
> Thanks for the quick reply. Here's the output of jmap -heap:
> using thread-local object allocation.
> Parallel GC with 4 thread(s)
> Heap Configuration:
> MinHeapFreeRatio = 40
> MaxHeapFreeRatio = 70
> MaxHeapSize = 3221225472 (3072.0MB)
> NewSize = 1310720 (1.25MB)
> MaxNewSize = 17592186044415 MB
> OldSize = 5439488 (5.1875MB)
> NewRatio = 2
> SurvivorRatio = 8
> PermSize = 21757952 (20.75MB)
> MaxPermSize = 174063616 (166.0MB)
> Heap Usage:
> PS Young Generation
> Eden Space:
> capacity = 641073152 (611.375MB)
> used = 367863672 (350.82213592529297MB)
> free = 273209480 (260.55286407470703MB)
> 57.38247980785818% used
> From Space:
> capacity = 212992000 (203.125MB)
> used = 188128576 (179.41339111328125MB)
> free = 24863424 (23.71160888671875MB)
> 88.32659254807692% used
> To Space:
> capacity = 216334336 (206.3125MB)
> used = 0 (0.0MB)
> free = 216334336 (206.3125MB)
> 0.0% used
> PS Old Generation
> capacity = 1739915264 (1659.3125MB)
> used = 1083663768 (1033.462303161621MB)
> free = 656251496 (625.8501968383789MB)
> 62.282559985633874% used
> PS Perm Generation
> capacity = 21757952 (20.75MB)
> used = 9952064 (9.49102783203125MB)
> free = 11805888 (11.25897216796875MB)
> 45.73989316641566% used
> The histogram shows the following (a portion), if that helps. You can see
> that there's ~1.5 GB of byte in the heap of stuff we're uncertain about.
> I didn't bother to dump the heap in binary format. All of the znodes and
> data created by the test have been deleted.
> Object Histogram:
> num #instances #bytes Class description
> 1: 112235 1546244888 byte
> 2: 20804 69853352 int
> 3: 40920 4718272 char
> 4: 14659 2296304 * ConstMethodKlass
> 5: 39018 1872864 java.nio.HeapByteBuffer
> 6: 14659 1767608 * MethodKlass