Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Re: Efficient backup and a reasonable restore of an ensemble


+
Flavio Junqueira 2013-07-08, 21:30
+
Sergey Maslyakov 2013-07-08, 22:28
+
Thawan Kooburat 2013-07-09, 00:53
+
kishore g 2013-07-09, 01:34
+
Thawan Kooburat 2013-07-09, 03:09
+
kishore g 2013-07-09, 04:05
+
Sergey Maslyakov 2013-07-09, 04:42
+
Ted Dunning 2013-07-09, 05:32
+
kishore g 2013-07-09, 05:08
+
Flavio Junqueira 2013-07-09, 09:12
+
Sergey Maslyakov 2013-07-09, 16:02
+
Ted Dunning 2013-07-09, 20:00
+
Flavio Junqueira 2013-07-09, 16:47
+
kishore g 2013-07-09, 17:01
Copy link to this message
-
RE: Efficient backup and a reasonable restore of an ensemble
Flavio Junqueira 2013-07-09, 17:04
Heh, nothing to be sorry about, thanks for feedback and for raising these
points, Kishore.

-Flavio

-----Original Message-----
From: kishore g [mailto:[EMAIL PROTECTED]]
Sent: 09 July 2013 19:01
To: [EMAIL PROTECTED]
Subject: Re: Efficient backup and a reasonable restore of an ensemble

Sorry Flavio, I mixed two things in my previous email. When i said
checkpoint A, it means just save the last committed transaction id (No
snapshot will be taken). When we need to do restore we will simply run the
tool to bring the data directory to that particular zxid( We will truncate
the txn log after that zxid). We can now restart the server and we should
get back to that particular point.
The second part about fuzzy snapshot, I was just trying to explain to Sergey
that its not really fuzzy if he knows for sure that there are no updates
while taking snapshot. This really depends on the use case, for example if
all writes happen via a manually run tool then snapshot should not be fuzzy.

On Tue, Jul 9, 2013 at 9:02 AM, Sergey Maslyakov <[EMAIL PROTECTED]> wrote:

> I think I am having difficulties understanding the "fuzzy" concept.
> Let's say I started to serialize DataTree into a snapshot file and it
> took 30 seconds. During these 30 seconds, the server saw 5
> transactions that updated the data. Does this mean that the snapshot
> that I get on disk at the end of the 30-second interval will have some of
these 5 transactions?
> Or will it have none? Or will it have all of them? Or will it be
> inconsistent and unreadable by Zookeeper?
>
> Please help me better understand the behavior behind the "fuzzy" term.
>
> For my use case, I am perfectly fine if I get a snapshot with none of
> these
> 5 transactions, considering that I will pick them up next time I take
> a snapshot.
>
>
> /Sergey
>
>
> On Tue, Jul 9, 2013 at 12:08 AM, kishore g <[EMAIL PROTECTED]> wrote:
>
> > Its not really elaborate, it is very similar to what zookeeper does
> > when
> it
> > starts up. It first reads the latest snapshot file and then the
> transaction
> > logs and applies each and every transaction. What I am suggesting is
> > that instead of applying all transactions stop at a transaction i
provide.
> >
> > Having this tool will actually simplify your task, you can go back
> > to any point in time. Think of a something like this.
> >
> > checkpoint A // this can store the last zxid or timestamp from the
> leader.
> > Make changes to zk
> > //if things fails
> > stop zks
> > rollback A//run this on each zk, brings back the cluster to its
> > previous state.
> > start zks // any order should be fine.
> >
> >
> > Also keep in mind that snapshot is fuzzy only if there are writes
> happening
> > while taking snapshot. If you are sure no writes will happen when
> > you are taking the snapshot then you are good. Experts, please
> > correct me if this is incorrect.
> >
> > thanks,
> > Kishore G
> >
> >
> > On Mon, Jul 8, 2013 at 9:42 PM, Sergey Maslyakov <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Kishore,
> > >
> > > This sounds like a very elaborate tool. I was trying to find a
> simplistic
> > > approach but what Thawan said about "fuzzy snapshots" makes me a
> > > little afraid that there is no simple solution.
> > >
> > >
> > > On Mon, Jul 8, 2013 at 11:05 PM, kishore g <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Agree, we already have such a tool. In fact we use it to
> > > > reconstruct
> > the
> > > > sequence of events that led to a failure and actually restore
> > > > the
> > system
> > > to
> > > > a previous stable point and replay the events. Unfortunately
> > > > this is
> > tied
> > > > closely with Helix but it should be easy to make this a generic
tool.
> > > >
> > > > Sergey is this something that will be useful in your case.
> > > >
> > > > Thanks,
> > > > Kishore G
> > > >
> > > >
> > > > On Mon, Jul 8, 2013 at 8:09 PM, Thawan Kooburat <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > On restore part, I think having a separate utility to
current.
+
Sergey Maslyakov 2013-07-09, 04:40
+
Sergey Maslyakov 2013-07-09, 04:34
+
Sergey Maslyakov 2013-07-09, 04:25
+
jack ma 2013-07-16, 15:38