Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Re: Efficient backup and a reasonable restore of an ensemble


+
Flavio Junqueira 2013-07-08, 21:30
+
Sergey Maslyakov 2013-07-08, 22:28
+
Thawan Kooburat 2013-07-09, 00:53
+
kishore g 2013-07-09, 01:34
+
Thawan Kooburat 2013-07-09, 03:09
+
kishore g 2013-07-09, 04:05
Copy link to this message
-
Re: Efficient backup and a reasonable restore of an ensemble
Sergey Maslyakov 2013-07-09, 04:42
Kishore,

This sounds like a very elaborate tool. I was trying to find a simplistic
approach but what Thawan said about "fuzzy snapshots" makes me a little
afraid that there is no simple solution.
On Mon, Jul 8, 2013 at 11:05 PM, kishore g <[EMAIL PROTECTED]> wrote:

> Agree, we already have such a tool. In fact we use it to reconstruct the
> sequence of events that led to a failure and actually restore the system to
> a previous stable point and replay the events. Unfortunately this is tied
> closely with Helix but it should be easy to make this a generic tool.
>
> Sergey is this something that will be useful in your case.
>
> Thanks,
> Kishore G
>
>
> On Mon, Jul 8, 2013 at 8:09 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>
> > On restore part, I think having a separate utility to manipulate the
> > data/snap dir (by truncating the log/removing snapshot to a given zxid)
> > would be easier than modifying the server.
> >
> >
> > --
> > Thawan Kooburat
> >
> >
> >
> >
> >
> > On 7/8/13 6:34 PM, "kishore g" <[EMAIL PROTECTED]> wrote:
> >
> > >I think what we are looking at is a  point in time restore
> functionality.
> > >How about adding a feature that says go back to a specific
> zxid/timestamp.
> > >This way before doing any change to zookeeper simply note down the
> > >timestamp/zxid on leader. If things go wrong after making changes, bring
> > >down zookeepers and provide additional parameter of a zxid/timestamp
> while
> > >restarting. The server can go the exact point and make it current. The
> > >followers can be started blank.
> > >
> > >
> > >
> > >On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
> > >
> > >> Just saw that  this is the corresponding use case to the question
> posted
> > >> in dev list.
> > >>
> > >> In order to restore the data to a given point in time correctly, you
> > >>need
> > >> both snapshot and txnlog. This is because zookeeper snapshot is fuzzy
> > >>and
> > >> snapshot alone may not represent a valid state of the server if there
> > >>are
> > >> in-flight requests.
> > >>
> > >> The 4wl command should cause the server to roll the log and take a
> > >> snapshot similar to periodic snapshotting operation. Your backup
> script
> > >> need grap the snapshot and corresponding txnlog file from the data
> dir.
> > >>
> > >> To restore, just shutdown all hosts, clear the data dir, copy over the
> > >> snapshot and txnlog, and restart them.
> > >>
> > >>
> > >> --
> > >> Thawan Kooburat
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On 7/8/13 3:28 PM, "Sergey Maslyakov" <[EMAIL PROTECTED]> wrote:
> > >>
> > >> >Thank you for your response, Flavio. I apologize, I did not provide a
> > >> >clear
> > >> >explanation of the use case.
> > >> >
> > >> >This backup/restore is not intended to be tied to any write event,
> > >> >instead,
> > >> >it is expected to run as a periodic (daily?) cron job on one of the
> > >> >servers, which is not guaranteed to be the leader of the ensemble.
> > >>There
> > >> >is
> > >> >no expectation that all recent changes are committed and persisted to
> > >> >disk.
> > >> >The system can sustain the loss of several hours worth of recent
> > >>changes
> > >> >in
> > >> >the event of restore.
> > >> >
> > >> >As for finding the leader dynamically and performing backup on it,
> this
> > >> >approach could be more difficult as the leader can change time to
> time
> > >>and
> > >> >I still need to fetch the file to store it in my designated backup
> > >> >location. Taking backup on one server and picking it up from a local
> > >>file
> > >> >system looks less error-prone. Even if I went the fancy route and had
> > >> >Zookeeper send me the serialized DataTree in response to the 4wl,
> this
> > >> >approach would involve a lot of moving parts.
> > >> >
> > >> >I have already made a PoC for a new 4wl that invokes takeSnapshot()
> and
> > >> >returns an absolute path to the snapshot it drops on disk. I have
> > >>already
> > >> >protected takeSnapshot() from concurrent invocation, which is likely
+
Ted Dunning 2013-07-09, 05:32
+
kishore g 2013-07-09, 05:08
+
Flavio Junqueira 2013-07-09, 09:12
+
Sergey Maslyakov 2013-07-09, 16:02
+
Ted Dunning 2013-07-09, 20:00
+
Flavio Junqueira 2013-07-09, 16:47
+
kishore g 2013-07-09, 17:01
+
Flavio Junqueira 2013-07-09, 17:04
+
Sergey Maslyakov 2013-07-09, 04:40
+
Sergey Maslyakov 2013-07-09, 04:34
+
Sergey Maslyakov 2013-07-09, 04:25
+
jack ma 2013-07-16, 15:38