Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Re: Efficient backup and a reasonable restore of an ensemble

Copy link to this message
Re: Efficient backup and a reasonable restore of an ensemble
One bit that is still a bit confusing to me in your use case is if you need to take a snapshot right after some event in your application. Even if you're able to tell ZooKeeper to take a snapshot, there is no guarantee that it will happen at the exact point you want it if update operations keep coming.

If you use your four-letter word approach, then would you search for the leader or would you simply take a snapshot at any server? If it has to go through the leader so that you make sure to have the most recent committed state, then it might not be a bad idea to have an api call that tells the leader to take a snapshot at some directory of your choice. Informing you the name of the snapshot file so that you can copy sounds like an option, but perhaps it is not as convenient.

The approach of adding another server is not very clear. How do you force it to be the leader? Keep in mind that if it crashes, then it will lose leadership.


On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <[EMAIL PROTECTED]> wrote:

> It looks like the "dev" mailing list is rather inactive. Over the past few
> days I only saw several automated emails from JIRA and this is pretty much
> it. Contrary to this, the "user" mailing list seems to be more alive and
> more populated.
> With this in mind, please allow me to cross-post here the message I sent
> into the "dev" list a few days ago.
> Regards,
> /Sergey
> === forwarded message begins here ==>
> Hi!
> I'm facing the problem that has been raised by multiple people but none of
> the discussion threads seem to provide a good answer. I dug in Zookeeper
> source code trying to come up with some possible approaches and I would
> like to get your inputs on those.
> Initial conditions:
> * I have an ensemble of five Zookeeper servers running v3.4.5 code.
> * The size of a committed snapshot file is in vicinity of 1GB.
> * There are about 80 clients connected to the ensemble.
> * Clients a heavily read biased, i.e., they mostly read and rarely write. I
> would say less than 0.1% of queries modify the data.
> Problem statement:
> * Under certain conditions, I may need to revert the data stored in the
> ensemble to an earlier state. For example, one of the clients may ruin the
> application-level data integrity and I need to perform a disaster recovery.
> Things look nice and easy if I'm dealing with a single Zookeeper server. A
> file-level copy of the data and dataLog directories should allow me to
> recover later by stopping Zookeeper, swapping the corrupted data and
> dataLog directories with a backup, and firing Zookeeper back up.
> Now, the ensemble deployment and the leader election algorithm in the
> quorum make things much more difficult. In order to restore from a single
> file-level backup, I need to take the whole ensemble down, wipe out data
> and dataLog directories on all servers, replace these directories with
> backed up content on one of the servers, bring this server up first, and
> then bring up the rest of the ensemble. This [somewhat] guarantees that the
> populated Zookeeper server becomes a member of a majority and populates the
> ensemble. This approach works but it is very involving and, thus,
> error-prone due to a human error.
> Based on a study of Zookeeper source code, I am considering the following
> alternatives. And I seek advice from Zookeeper development community as to
> which approach looks more promising or if there is a better way.
> Approach #1:
> Develop a complementary pair of utilities for export and import of the
> data. Both utilities will act as Zookeeper clients and use the existing
> API. The "export" utility will recursively retrieve data and store it in a
> file. The "import" utility will first purge all data from the ensemble and
> then reload it from the file.
> This approach seems to be the simplest and there are similar tools
> developed already. For example, the Guano Project:
> https://github.com/d2fn/guano