Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Backups


User error is a valid use case.  Are we assuming that because of user error
the ZK is not usable at this point? if not, can some one please explain how
having a back up can actually restore the data without bringing all zk
servers down and not disrupting the clients.

If we really want to take care of user error then what we need is probably
a way to go back to the state just before the transaction that messed up ZK
state. Can we not achieve this by providing a tool to generate snap and
transaction log such that when the server is re-started it starts exactly
from the transaction. We can do this by simply using the existing snapshot
files and transaction logs from any of the servers. Do we really need a
separate backup since the data is available on multiple servers.

We need a way to generate a snap shot that will take us to the exact time (
either using timestamp or transaction number). One problem i see is
probably zk cant go back in transaction number

Thoughts?
On Thu, Jan 19, 2012 at 11:42 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> That is one important case.  The offsite backup condition is probably well
> handled by a listener.
>
> On Thu, Jan 19, 2012 at 7:30 PM, Flavio Junqueira <[EMAIL PROTECTED]>
> wrote:
>
> > You're not talking about data corruption, are you? It is incorrect data
> > that has been introduced by a user or application by mistake. Am I
> getting
> > it right?
> >
> > -Flavio
> >
> >
> > On Jan 19, 2012, at 8:07 PM, Jordan Zimmerman wrote:
> >
> >  It's that very replication that creates the need for backups. In there
> is
> >> a user error or a bad injection of data, the error will quickly
> replicate
> >> to all the instances. There's no way to recover without an external
> >> backup.
> >>
> >>
> >> -JZ
> >>
> >>
> >> On 1/19/12 10:39 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote:
> >>
> >>  Hi Ted, Znodes for leader election, group membership, etc, can all be
> >>> recreated, so why should I back them up instead of recreating the
> >>> znodes? In fact, one might bring back a previous snapshot of the
> >>> system that reflects an incorrect system state.
> >>>
> >>> In the case that one stores data that can't be recovered by other
> >>> means, I understand the need, but then we have the durability problem
> >>> that I mentioned and you apparently agreed. Also, ZooKeeper is a
> >>> replicated service, so why can't you simply rely upon the replication
> >>> strategy that ZooKeeper provides to you already? Again, I'm trying to
> >>> understand the use cases here.
> >>>
> >>> Thanks,
> >>> -Flavio
> >>>
> >>> On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote:
> >>>
> >>>  A backup can still be useful.  It is a common property that a database
> >>>> backup is known to be slightly out of date.
> >>>>
> >>>> Such a backup can still be very useful.  In many systems, the most
> >>>> common
> >>>> cause of error is simple human intervention.  This especially
> >>>> applies to
> >>>> file systems and databases, but can still apply to ZK if an admin
> >>>> carelessly tries to clean up part of the namespace and accidentally
> >>>> cleans
> >>>> up all of it.  This should be much less common with ZK because manual
> >>>> adjustments are so much less a part of standard operation, but they
> >>>> can
> >>>> still occur.  In these cases, an out-of-date backup may be enormously
> >>>> valuable.
> >>>>
> >>>> If somebody wants a precise backup from a particular moment in time,
> >>>> the
> >>>> best option is to use the snapshot capabilities exposed by various
> >>>> file
> >>>> systems.  Traditional NAS vendors all support this.  At a lower cost
> >>>> and
> >>>> complexity point, you can get this from MapR clusters exposed as NFS
> >>>> or by
> >>>> a ZFS file system.  This option also allows you to keep multiple
> >>>> snapshots
> >>>> from points in the past.
> >>>>
> >>>> What Jordan is doing would allow backups without special storage
> >>>> devices
> >>>> and, with good backup of the log, would allow nearly current