Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # dev - Why is there a separate lookup HashMap in DataTree?

Copy link to this message
Why is there a separate lookup HashMap in DataTree?
Thomas Koch 2011-09-08, 15:35

I'm studying the ZK DataTree and wondered, whether the code overhead of a
lookup HashMap for the nodes provides the expected speed benefit.

Therefor I found japex[1], a microbenchmarking tool and wrote two different
"tree" implementations: A simple "HashMap<FullPath, Node>" and a real tree
where each Node contains "HashMap<name, Node> children".

I ran benchmarks to lookup random nodes with trees of different depth ("l" =
levels) and children count for each node. (see attachment,
tps=transactions/sec) The benchmark was executed single threaded on my laptop.
I don't dare to say, that it's accurate, but it may be good enough to make one

So the HashMap based lookup seems to be faster for all configurations, but at
most twice as fast. On the other hand the code is made more complicated, the
memory consumption of every path in the DataTree is doubled and the lookup
time may hardly be a bottleneck at all in a network application.

I scanned the VCS logs of ZK to find informations when the HashMap lookup was
introduced and how it was motivated, but it was already there when the project
came out of Yahoo.

Could sbd. please give more information, why the HashMap lookup has been
introduced and whether benchmarks were made at that time?

A slightly related topic: Does anybody run regulary benchmarks against ZK? Is
there a benchmark suite? Would you recommend a tool to build benchmarks for

[1] http://japex.java.net

Thank you,

Thomas Koch, http://www.koch.ro