Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Zookeeper ensemble backup questions?

jack ma 2013-07-19, 18:00
Copy link to this message
Re: Zookeeper ensemble backup questions?

Here is how I see the backup process happening.

1. Zookeeper server can be changed to support a new 4lw that will write out
the current state of the DataTree into a snapshot file with the path and
name provided as an argument to this new command (barring all the
permissions, disk space, and other system-level restrictions). Probably, I
would ask Zookeeper to save the snapshot in a directory outside of the
standard "dataLog" for the sake of cleanliness.

2. When Zookeeper server responds to the new "snapshot" command with
success indication, the requesting process knows that the file has been
written out and it can go and process it. It can add some metadata and
create an archive to store it somewhere, for example. Alternatively,
Zookeeper server could stream the data it would have written into a
snapshot as the response to the new "snapshot" command. This way, the
client becomes responsible for persistence and this lifts a number of
permission-related issues (but raises some other issues too). Oh, and by
the way, it looks like snapshot files are rather compressible. I did see
the factor of 20 and more on the data that I have.

3. Disk cleanups are performed.

With this backup procedure the restore would turn into:

1. Stopping all ensemble mebers

2. Wiping out dataDir/version-2 and dataLogDir/version-2

3. Restoring the snapshot taken by the above backup procedure on one of the
servers into dataDir/version-2

4. Bringing this server online

5. Allowing some time for it to load the snapshot. You could send "isro"
4lw command to it to see when it stops responding with "null". When the
response becomes "ro" or "rw", this is when it is ready to populate others
with its own data

6. Bring up other servers one-by-one, to allow them form a quorum with the
populated server
Hope, this helps! I'd be glad to hear from people who know the internals of
Zookeeper server better whether this approach is flawed or robust.
On Fri, Jul 19, 2013 at 1:00 PM, jack ma <[EMAIL PROTECTED]> wrote:

> I asked those question in the thread
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+[EMAIL PROTECTED]%3e
> ,
> but there is no response for that.
> So I posted those questions again here, hopefully I could get helps
> from the community.
> I want to make sure I am fully understanding the procedures of zookeeper
> backup and disaster recovery:
> For the backup procedures at zookeeper assemble:
> (1) Login to any host which state is "Serving"
>            Question:
>                   Do I have to login to leader node, or any node is ok?
> (2) Copy latest snapshot file and transaction log from version-2 directory.
>            Question:
>                   How to make sure we do not copy corrupt files if the
> snapshot/transaction log is in the middle of update? Do we have to shutdown
> the node to make the copy?
>                   besides the transaction log and snapshot, do we have to
> copy other files such as the ecoch files
> For the disaster recovery procedures at zookeeper assemble:
> (1) recreate the machines for the zookeeper ensemble
> (2) copy snapshot/transaction log we backed up into the zookeeper
> dataDir\version-2 and logDir\version2.
>            Question:
>                  Do we have to copy the epoch files?
>                  Do we have to copy snapshot/transaction log backed up to
> all the zookeeper node, or just the first node we starts?
> Appreciate your time and help.
> Jack
jack ma 2013-07-19, 18:42
Sergey Maslyakov 2013-07-19, 19:15
Sergey Maslyakov 2013-07-19, 19:29
jack ma 2013-07-19, 22:28
Thawan Kooburat 2013-07-19, 21:24
jack ma 2013-07-19, 22:32
Thawan Kooburat 2013-07-19, 22:49
Jordan Zimmerman 2013-07-19, 18:13
Sergey Maslyakov 2013-07-19, 18:21
Alexander Shraer 2013-07-19, 18:26
Sergey Maslyakov 2013-07-19, 18:38
Alexander Shraer 2013-07-19, 18:45