Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Re: Efficient backup and a reasonable restore of an ensemble


+
Flavio Junqueira 2013-07-08, 21:30
+
Sergey Maslyakov 2013-07-08, 22:28
+
Thawan Kooburat 2013-07-09, 00:53
+
kishore g 2013-07-09, 01:34
+
Thawan Kooburat 2013-07-09, 03:09
+
kishore g 2013-07-09, 04:05
+
Sergey Maslyakov 2013-07-09, 04:42
+
Ted Dunning 2013-07-09, 05:32
+
kishore g 2013-07-09, 05:08
+
Flavio Junqueira 2013-07-09, 09:12
+
Sergey Maslyakov 2013-07-09, 16:02
+
Ted Dunning 2013-07-09, 20:00
+
Flavio Junqueira 2013-07-09, 16:47
+
kishore g 2013-07-09, 17:01
+
Flavio Junqueira 2013-07-09, 17:04
+
Sergey Maslyakov 2013-07-09, 04:40
Copy link to this message
-
Re: Efficient backup and a reasonable restore of an ensemble
Sergey Maslyakov 2013-07-09, 04:34
Sounds like a long transaction (or undo) log, which may impact the
performance.
On Mon, Jul 8, 2013 at 8:34 PM, kishore g <[EMAIL PROTECTED]> wrote:

> I think what we are looking at is a  point in time restore functionality.
> How about adding a feature that says go back to a specific zxid/timestamp.
> This way before doing any change to zookeeper simply note down the
> timestamp/zxid on leader. If things go wrong after making changes, bring
> down zookeepers and provide additional parameter of a zxid/timestamp while
> restarting. The server can go the exact point and make it current. The
> followers can be started blank.
>
>
>
> On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>
> > Just saw that  this is the corresponding use case to the question posted
> > in dev list.
> >
> > In order to restore the data to a given point in time correctly, you need
> > both snapshot and txnlog. This is because zookeeper snapshot is fuzzy and
> > snapshot alone may not represent a valid state of the server if there are
> > in-flight requests.
> >
> > The 4wl command should cause the server to roll the log and take a
> > snapshot similar to periodic snapshotting operation. Your backup script
> > need grap the snapshot and corresponding txnlog file from the data dir.
> >
> > To restore, just shutdown all hosts, clear the data dir, copy over the
> > snapshot and txnlog, and restart them.
> >
> >
> > --
> > Thawan Kooburat
> >
> >
> >
> >
> >
> > On 7/8/13 3:28 PM, "Sergey Maslyakov" <[EMAIL PROTECTED]> wrote:
> >
> > >Thank you for your response, Flavio. I apologize, I did not provide a
> > >clear
> > >explanation of the use case.
> > >
> > >This backup/restore is not intended to be tied to any write event,
> > >instead,
> > >it is expected to run as a periodic (daily?) cron job on one of the
> > >servers, which is not guaranteed to be the leader of the ensemble. There
> > >is
> > >no expectation that all recent changes are committed and persisted to
> > >disk.
> > >The system can sustain the loss of several hours worth of recent changes
> > >in
> > >the event of restore.
> > >
> > >As for finding the leader dynamically and performing backup on it, this
> > >approach could be more difficult as the leader can change time to time
> and
> > >I still need to fetch the file to store it in my designated backup
> > >location. Taking backup on one server and picking it up from a local
> file
> > >system looks less error-prone. Even if I went the fancy route and had
> > >Zookeeper send me the serialized DataTree in response to the 4wl, this
> > >approach would involve a lot of moving parts.
> > >
> > >I have already made a PoC for a new 4wl that invokes takeSnapshot() and
> > >returns an absolute path to the snapshot it drops on disk. I have
> already
> > >protected takeSnapshot() from concurrent invocation, which is likely to
> > >corrupt the snapshot file on disk. This approach works but I'm thinking
> to
> > >take it one step further by providing the desired path name as an
> argument
> > >to my new 4lw and to have Zookeeper server drop the snapshot into the
> > >specified file and report success/failure back. This way I can avoid
> > >cluttering the data directory and interfering with what Zookeeper finds
> > >when it scans the data directory.
> > >
> > >Approach with having an additional server that would take the leadership
> > >and populate the ensemble is just a theory. I don't see a clean way of
> > >making a quorum member the leader of the quorum. Am I overlooking
> > >something
> > >simple?
> > >
> > >In backup and restore of an ensemble the biggest unknown for me remains
> > >populating the ensemble with desired data. I can think of two ways:
> > >
> > >1. Clear out all servers by stopping them, purge version-2 directories,
> > >restore a snapshot file on one server that will be brought first, and
> then
> > >bring up the rest of the ensemble. This way I somewhat force the first
> > >server to be the leader because it has data and it will be the only
+
Sergey Maslyakov 2013-07-09, 04:25
+
jack ma 2013-07-16, 15:38