I'm hosting an intern this summer. One project I've been thinking about is to decouple zab from zookeeper. There are many use cases where you need a quorum based replication, but the hierarchical data model doesn't work well. A smallish (~1GB?) replicated key-value store with millions of entires is one such example. The goal of the project is to decouple the consensus algorithm (zab) from the data model (zookeeper) more cleanly so that the users can define their own data models and use zab to replicate the data.
I have 2 questions:
1. Are there any caveats that I should be aware of? For example, transactions need to be idempotent to allow fuzzy snapshotting. 2. Is this useful? Personally I've seen many use cases where this would be very useful, but I'd like to hear what you guys think.
1- You'd like to be able to plug in new algorithms or at least make a clear separation of the replication protocol and the logic of the service. 2- You'd like to have an implementation of Zab that you could use for other things, like a kv store.
I think you're focusing more on 2. You can definitely use Zab for other things, and I'm all for it. It would probably be better to just implement the protocol from scratch rather than extract it from ZooKeeper. In fact, it might be worth having a look at ZK-30 (old one, huh?).
In the case of reimplementing it, it might be worth doing it outside ZooKeeper, as a separate project. It could be an incubated project.
Hope it helps!
-Flavio On 31 May 2014, at 22:29, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:
On 31 May 2014 14:29, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote: I think this is super useful. As Flavio said, I think there are two approaches: having ZAB as a library first or carving out the ZAB bits and having a generic interface to plug in other protocols.
From the ZooKeeper's project PoV, I think that the latter would be awesome, because we can clean up a lot of code as it happens.
From an intern project's PoV, it sounds like working on an independent ZAB implementation (libzab?) from scratch is easier to target (and will have no impedance, getting huge changes merged into ZooKeeper takes times...). -rgs
Thank you Flavio and Raul. Thank you for pointing me to ZOOKEEPER-30. Yes, I was focused more on 2, but it's definitely a good idea to have a generic interface for atomic broadcast so that you can plug in different algorithms. It seems like the project can be broken into 3 pieces:
1. Define an interface for atomic broadcast. I'm not sure how things like session tracker and dynamic reconfig fits into this. 2. Add a ZAB implementation of the interface. 3. Create a simple reference implementation of a service (maybe a simple key-value store or a benchmark tool).
I agree with both of you that it's better to do this as a separate project. Also, It might be better to do this as an incubator project from the beginning. I think it makes it easier for people from different organizations to collaborate. I'm willing to champion the project.
I'll open a JIRA once the intern is committed to the project.
The use case this project is going after is to durably replicate in-memory state. I think this project can differentiate itself from BookKeeper.
1. BookKeeper is pretty heavyweight, as you need to deploy ZooKeeper and bookies. I think there are use cases where you don't need the horizontal scalability BookKeeper provides, and you prefer to have a light-weight library for replicating state. ZooKeeper is one such example :) 2. Please correct me if I'm wrong, but BookKeeper is not designed for maintaining multiple in-memory replicas. A ledger can't be opened for reading if it's already open for writing, and you need to recover by restoring from a snapshot and replaying log entries if the writer goes down. 3. ZOOKEEPER-30, which I wasn't initially aware of, is another motivation. I think there is a value in having a common interface for consensus algorithms so that services can plug in different implementations. This makes it easier to benchmark and test correctness of various implementations. On Sun, Jun 1, 2014 at 3:05 AM, Ivan Kelly <[EMAIL PROTECTED]> wrote:
I'm not sure it is worth transforming this discussion into a bk vs. zk/zab. I think the space they target is different, although they both deal with replication. It does sound worth having a separate zab implementation, but it isn't clear that it is worth separating zab in the zookeeper code base.
There seem to be some misconceptions here, so here are some clarifications:
- Zab itself doesn't deal with snapshots, it essentially replicates a log. The use of snapshots is an optimization to speed up recovery, and sure, it fits well into the framework of the protocol. - BookKeeper indeed relies on zk because it requires a component for configuration and metadata of ledgers. By relying on a separate configuration component, the pool of bookies can grow and shrink arbitrarily, and such changes do not affect write performance like with zk. The configuration component, however, needs the properties of a protocol like zab, so we still need something like zab. - Calling BK heavyweight is a bit of a stretch. Bookies + zk makes only two components! These are not production numbers, but I don't see a deployment with fewer than 10 machines (5 for ZK + 5 bookies) being very interesting. If that's a significant fraction of your overall server footprint, then sure, it is heavy for you.
On 01 Jun 2014, at 19:22, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:
Thank you for the clarifications Flavio. I guess 'heavyweight' is a relative term. A typical use cases I deal with is to replicate small amount of data (<1GB) among 3 ~ 5 servers, and having access to zab would be very useful.
I didn't mean to suggest to separate zab in the zookeeper code base. I referred to ZOOKEEPER-30 to highlight the usefulness of having a common interface for replication protocol.
Thanks! On Sun, Jun 1, 2014 at 2:52 PM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:
I think that reconfig should be the responsibility of the atomic broadcast / replicated log implementation (if supported by the specific implementation). Client management and sessions seem like application dependent.
I'd also suggest to check out existing open source paxos libraries as an API reference. On Sun, Jun 1, 2014 at 6:11 PM, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:
I agree that the reconfiguration is a responsibility of the atomic broadcast. I feel that session management might need to rely on the atomic broadcast exposing additional primitives. For example, right now ZooKeeper forwards session information to the leader by piggybacking it in the quorum ping packets.
Let me know if you know good open source libraries for references. So far I've looked at ZooKeeper and goraft.
On Sun, Jun 1, 2014 at 6:36 PM, Alexander Shraer <[EMAIL PROTECTED]> wrote:
Decoupling ZAB is a good idea and like you all mentioned it could be used for things, like a key value store.
I've come across one such case in HDFS, where they have solved the problem their own way. As I know, the approach taken in this design is based on the well-known ZAB and Paxos. So I hope there is a space for such libraries in the real world.
I was thinking from the point of view that if you want to provide ZAB as a library, then the library will have to provide an RPC mechanism for talking to other members of the quorum, and a means to persist updates to disk before responding, and _then_ provide a ZAB implementation somewhere in between. This doesn't seem much lighter than BK.
I think it's a worthwhile thing to pursue, but I disagree that a separate project is a better way to doing it. If this is an intern project, expecting them to reimplement ZAB might be a bit of a large ask (depending on the internship length and the intern themselves). An investigation into splitting the user interface layer of zookeeper and ZAB seems itself to be a nice chunk to work on, and it has the advantage that even if the changes don't get merged into trunk, there will be a clearer picture as to why they can't be split.
You can read from a ledger while it is being written to, but right now it's polling. Twitter are working on some changes to make it more notification like to reduce latency between the primary writing and the secondary reading.
I have a few reasons for suggesting a separate project:
- I don't see a reason for tying the releases of an independent implementation of Zab to ZooKeeper - The set of developers (and committers) interested in an independent implementation of Zab might be different compared to ZooKeeper; it could really be a separate community - It really feels like parallel efforts along the lines of Curator and BookKeeper, so I see it following similar steps
Regarding the effort of an intern, I guess it depends how far you want the initial stretch to go. An initial implementation to contribute to Apache followed by community activity might get it going.
I agree with Flavio about keeping this a separate project. Having said that, at the point I'm not 100% sure whether the intern will implement ZAB completely from scratch, or start from a fork of the ZooKeeper code base. At this point I'm somewhat leaning towards using the ZooKeeper code base as a starting point. As Ivan pointed out, it's pretty ambitious to implement ZAB correctly in a short amount of time, and it would be good to have something demonstrable at the end of the internship. On Mon, Jun 2, 2014 at 9:19 AM, FPJ <[EMAIL PROTECTED]> wrote:
It would be great to do a clean implementation of Zab. We have added a lot crap for backward compatibility, and the reconfig stuff, although a great feature properly implemented, didn't improve the state of the code. Also, an implementation of the Zab protocol perhaps putting snapshots aside for v0.1, shouldn't take more than just a few weeks.
If you do it openly say on github, then I volunteer to help.
-Flavio On 03 Jun 2014, at 19:16, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:
On 3 June 2014 12:44, Flavio Junqueira <[EMAIL PROTECTED]lid> wrote: A clean-room implementation of ZAB could indeed be awesome for multiple purposes. Reasoning around the current implementation is some times challenging for us missing the historical context.
Would be more than happy to help with reviews and such as well. -rgs
Thanks for the github repo.address. I was just about to write you to send it. I will follow up with this as it is an interesting project. I read the entire conversation and agree with some points.
Thanks, Claudiu On Wed, Jun 4, 2014 at 9:46 AM, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext