HDFS, mail # dev - BookKeeper Journal Manager for Namenode

BookKeeper Journal Manager for Namenode
Ivan Kelly 2011-11-15, 16:29
Hi guys,

I've just uploaded a patch to HDFS-234 which contains an implementation of JournalManager for BookKeeper. The code is ready for review, though I plan to add some more tests. The code relies on HDFS-1580 which isn't in trunk yet. The code is on github if you want to avoid faffing about with multiple patches. (https://github.com/ivankelly/hadoop-common/tree/HDFS-234)

To configure the namenode to use BK with this code, put the following in hdfs-site.xml


Where zkEnsemble is a semicolon[1] separated list of zookeeper servers, and zkPath is the znode path under which the editlog metadata should be stored. For example, if you have 3 servers, zk1-3 with zookeeper listening on port 2181, and you want to store the metadata under /hdfsnn, the URI would be bookkeeper://zk1:2181;zk2:2181;zk3:2181/hdfsnn.

I benchmarks this code against an NFS filer, local storage and a NoPersist implementation of JournalManager which simply discarded edits to get a theoretical max. I ran the bench using NNThroughputBenchmark, to create 100000 ops. I've attached the graph generated. The graph shows that bookkeeper sees similar throughput to NFS and local file (very slightly lower). Latency is a little higher, but once the disk cache for the local disk saturates, BK's latency is lower. The NFS filer has a big chunk of NVRAM, so it maintains low latency until the client saturates.