Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Re: Efficient backup and a reasonable restore of an ensemble


+
Flavio Junqueira 2013-07-08, 21:30
+
Sergey Maslyakov 2013-07-08, 22:28
+
Thawan Kooburat 2013-07-09, 00:53
+
kishore g 2013-07-09, 01:34
+
Thawan Kooburat 2013-07-09, 03:09
+
kishore g 2013-07-09, 04:05
+
Sergey Maslyakov 2013-07-09, 04:42
+
Ted Dunning 2013-07-09, 05:32
+
kishore g 2013-07-09, 05:08
+
Flavio Junqueira 2013-07-09, 09:12
+
Sergey Maslyakov 2013-07-09, 16:02
+
Ted Dunning 2013-07-09, 20:00
+
Flavio Junqueira 2013-07-09, 16:47
+
kishore g 2013-07-09, 17:01
+
Flavio Junqueira 2013-07-09, 17:04
+
Sergey Maslyakov 2013-07-09, 04:40
Copy link to this message
-
Re: Efficient backup and a reasonable restore of an ensemble
Sounds like a long transaction (or undo) log, which may impact the
performance.
On Mon, Jul 8, 2013 at 8:34 PM, kishore g <[EMAIL PROTECTED]> wrote:

> I think what we are looking at is a  point in time restore functionality.
> How about adding a feature that says go back to a specific zxid/timestamp.
> This way before doing any change to zookeeper simply note down the
> timestamp/zxid on leader. If things go wrong after making changes, bring
> down zookeepers and provide additional parameter of a zxid/timestamp while
> restarting. The server can go the exact point and make it current. The
> followers can be started blank.
>
>
>
> On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <[EMAIL PROTECTED]> wrote:
>
> > Just saw that  this is the corresponding use case to the question posted
> > in dev list.
> >
> > In order to restore the data to a given point in time correctly, you need
> > both snapshot and txnlog. This is because zookeeper snapshot is fuzzy and
> > snapshot alone may not represent a valid state of the server if there are
> > in-flight requests.
> >
> > The 4wl command should cause the server to roll the log and take a
> > snapshot similar to periodic snapshotting operation. Your backup script
> > need grap the snapshot and corresponding txnlog file from the data dir.
> >
> > To restore, just shutdown all hosts, clear the data dir, copy over the
> > snapshot and txnlog, and restart them.
> >
> >
> > --
> > Thawan Kooburat
> >
> >
> >
> >
> >
> > On 7/8/13 3:28 PM, "Sergey Maslyakov" <[EMAIL PROTECTED]> wrote:
> >
> > >Thank you for your response, Flavio. I apologize, I did not provide a
> > >clear
> > >explanation of the use case.
> > >
> > >This backup/restore is not intended to be tied to any write event,
> > >instead,
> > >it is expected to run as a periodic (daily?) cron job on one of the
> > >servers, which is not guaranteed to be the leader of the ensemble. There
> > >is
> > >no expectation that all recent changes are committed and persisted to
> > >disk.
> > >The system can sustain the loss of several hours worth of recent changes
> > >in
> > >the event of restore.
> > >
> > >As for finding the leader dynamically and performing backup on it, this
> > >approach could be more difficult as the leader can change time to time
> and
> > >I still need to fetch the file to store it in my designated backup
> > >location. Taking backup on one server and picking it up from a local
> file
> > >system looks less error-prone. Even if I went the fancy route and had
> > >Zookeeper send me the serialized DataTree in response to the 4wl, this
> > >approach would involve a lot of moving parts.
> > >
> > >I have already made a PoC for a new 4wl that invokes takeSnapshot() and
> > >returns an absolute path to the snapshot it drops on disk. I have
> already
> > >protected takeSnapshot() from concurrent invocation, which is likely to
> > >corrupt the snapshot file on disk. This approach works but I'm thinking
> to
> > >take it one step further by providing the desired path name as an
> argument
> > >to my new 4lw and to have Zookeeper server drop the snapshot into the
> > >specified file and report success/failure back. This way I can avoid
> > >cluttering the data directory and interfering with what Zookeeper finds
> > >when it scans the data directory.
> > >
> > >Approach with having an additional server that would take the leadership
> > >and populate the ensemble is just a theory. I don't see a clean way of
> > >making a quorum member the leader of the quorum. Am I overlooking
> > >something
> > >simple?
> > >
> > >In backup and restore of an ensemble the biggest unknown for me remains
> > >populating the ensemble with desired data. I can think of two ways:
> > >
> > >1. Clear out all servers by stopping them, purge version-2 directories,
> > >restore a snapshot file on one server that will be brought first, and
> then
> > >bring up the rest of the ensemble. This way I somewhat force the first
> > >server to be the leader because it has data and it will be the only
+
Sergey Maslyakov 2013-07-09, 04:25
+
jack ma 2013-07-16, 15:38
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB