Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Re: Efficient backup and a reasonable restore of an ensemble

Copy link to this message
Re: Efficient backup and a reasonable restore of an ensemble
On restore part, I think having a separate utility to manipulate the
data/snap dir (by truncating the log/removing snapshot to a given zxid)
would be easier than modifying the server.
Thawan Kooburat

On 7/8/13 6:34 PM, "kishore g" <[EMAIL PROTECTED]> wrote:

>I think what we are looking at is a  point in time restore functionality.
>How about adding a feature that says go back to a specific zxid/timestamp.
>This way before doing any change to zookeeper simply note down the
>timestamp/zxid on leader. If things go wrong after making changes, bring
>down zookeepers and provide additional parameter of a zxid/timestamp while
>restarting. The server can go the exact point and make it current. The
>followers can be started blank.
>On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>> Just saw that  this is the corresponding use case to the question posted
>> in dev list.
>> In order to restore the data to a given point in time correctly, you
>> both snapshot and txnlog. This is because zookeeper snapshot is fuzzy
>> snapshot alone may not represent a valid state of the server if there
>> in-flight requests.
>> The 4wl command should cause the server to roll the log and take a
>> snapshot similar to periodic snapshotting operation. Your backup script
>> need grap the snapshot and corresponding txnlog file from the data dir.
>> To restore, just shutdown all hosts, clear the data dir, copy over the
>> snapshot and txnlog, and restart them.
>> --
>> Thawan Kooburat
>> On 7/8/13 3:28 PM, "Sergey Maslyakov" <[EMAIL PROTECTED]> wrote:
>> >Thank you for your response, Flavio. I apologize, I did not provide a
>> >clear
>> >explanation of the use case.
>> >
>> >This backup/restore is not intended to be tied to any write event,
>> >instead,
>> >it is expected to run as a periodic (daily?) cron job on one of the
>> >servers, which is not guaranteed to be the leader of the ensemble.
>> >is
>> >no expectation that all recent changes are committed and persisted to
>> >disk.
>> >The system can sustain the loss of several hours worth of recent
>> >in
>> >the event of restore.
>> >
>> >As for finding the leader dynamically and performing backup on it, this
>> >approach could be more difficult as the leader can change time to time
>> >I still need to fetch the file to store it in my designated backup
>> >location. Taking backup on one server and picking it up from a local
>> >system looks less error-prone. Even if I went the fancy route and had
>> >Zookeeper send me the serialized DataTree in response to the 4wl, this
>> >approach would involve a lot of moving parts.
>> >
>> >I have already made a PoC for a new 4wl that invokes takeSnapshot() and
>> >returns an absolute path to the snapshot it drops on disk. I have
>> >protected takeSnapshot() from concurrent invocation, which is likely to
>> >corrupt the snapshot file on disk. This approach works but I'm
>>thinking to
>> >take it one step further by providing the desired path name as an
>> >to my new 4lw and to have Zookeeper server drop the snapshot into the
>> >specified file and report success/failure back. This way I can avoid
>> >cluttering the data directory and interfering with what Zookeeper finds
>> >when it scans the data directory.
>> >
>> >Approach with having an additional server that would take the
>> >and populate the ensemble is just a theory. I don't see a clean way of
>> >making a quorum member the leader of the quorum. Am I overlooking
>> >something
>> >simple?
>> >
>> >In backup and restore of an ensemble the biggest unknown for me remains
>> >populating the ensemble with desired data. I can think of two ways:
>> >
>> >1. Clear out all servers by stopping them, purge version-2 directories,
>> >restore a snapshot file on one server that will be brought first, and
>> >bring up the rest of the ensemble. This way I somewhat force the first