|
Jordan Zimmerman
2012-01-13, 22:24
Camille Fournier
2012-01-13, 22:28
Jordan Zimmerman
2012-01-13, 22:29
Patrick Hunt
2012-01-17, 01:39
Jordan Zimmerman
2012-01-17, 01:42
Camille Fournier
2012-01-17, 01:49
Neha Narkhede
2012-01-17, 18:10
Jordan Zimmerman
2012-01-17, 19:06
Jordan Zimmerman
2012-01-18, 00:38
Flavio Junqueira
2012-01-19, 11:07
Jordan Zimmerman
2012-01-19, 17:32
Ted Dunning
2012-01-19, 18:11
Jordan Zimmerman
2012-01-19, 18:16
Ted Dunning
2012-01-19, 18:23
Patrick Hunt
2012-01-19, 18:24
Flavio Junqueira
2012-01-19, 18:39
Jordan Zimmerman
2012-01-19, 19:07
Flavio Junqueira
2012-01-19, 19:30
Jordan Zimmerman
2012-01-19, 19:32
Ted Dunning
2012-01-19, 19:40
Ted Dunning
2012-01-19, 19:41
Ted Dunning
2012-01-19, 19:42
kishore g
2012-01-20, 07:42
Patrick Hunt
2012-01-20, 17:01
|
-
BackupsJordan Zimmerman 2012-01-13, 22:24
As a backup strategy, it seems I would only want to backup snapshots from
the leader. Does that make sense? -JZ
-
Re: BackupsCamille Fournier 2012-01-13, 22:28
You want to have to figure out who the leader is every time you want to
take a backup? That would be the downside to this strategy I would think. C >From my phone On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> wrote: > As a backup strategy, it seems I would only want to backup snapshots from > the leader. Does that make sense? > > -JZ > >
-
Re: BackupsJordan Zimmerman 2012-01-13, 22:29
That's easy as the backup app is running on the same machine as the ZK
instance. I can use 'stat' to see if "my" instance is the leader. On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: >You want to have to figure out who the leader is every time you want to >take a backup? That would be the downside to this strategy I would think. > >C > >From my phone >On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> >wrote: > >> As a backup strategy, it seems I would only want to backup snapshots >>from >> the leader. Does that make sense? >> >> -JZ >> >>
-
Re: BackupsPatrick Hunt 2012-01-17, 01:39
Why would you limit to the leader? Wouldn't backing up any server (as
long as it's active) be sufficient? If you search the list it's been discussed before, using Observers seemed like a reasonable option as well. Patrick On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote: > That's easy as the backup app is running on the same machine as the ZK > instance. I can use 'stat' to see if "my" instance is the leader. > > On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: > >>You want to have to figure out who the leader is every time you want to >>take a backup? That would be the downside to this strategy I would think. >> >>C >> > >From my phone >>On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> >>wrote: >> >>> As a backup strategy, it seems I would only want to backup snapshots >>>from >>> the leader. Does that make sense? >>> >>> -JZ >>> >>> >
-
Re: BackupsJordan Zimmerman 2012-01-17, 01:42
We'll be backing up to S3. Wouldn't it be redundant to backup all the
instances? -JZ P.S. I'm working on a ZooKeeper instance manager that will have backup/restore and a bunch of other stuff. We'll be open sourcing it. If anyone is interested in previewing it let me know. On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >Why would you limit to the leader? Wouldn't backing up any server (as >long as it's active) be sufficient? If you search the list it's been >discussed before, using Observers seemed like a reasonable option as >well. > >Patrick > >On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman ><[EMAIL PROTECTED]> wrote: >> That's easy as the backup app is running on the same machine as the ZK >> instance. I can use 'stat' to see if "my" instance is the leader. >> >> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: >> >>>You want to have to figure out who the leader is every time you want to >>>take a backup? That would be the downside to this strategy I would >>>think. >>> >>>C >>> >> >From my phone >>>On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> >>>wrote: >>> >>>> As a backup strategy, it seems I would only want to backup snapshots >>>>from >>>> the leader. Does that make sense? >>>> >>>> -JZ >>>> >>>> >> >
-
Re: BackupsCamille Fournier 2012-01-17, 01:49
Not all the instances, any instance. Which would be a bit simpler than
having to look for leader then back up only that one. C On Mon, Jan 16, 2012 at 8:42 PM, Jordan Zimmerman <[EMAIL PROTECTED]>wrote: > We'll be backing up to S3. Wouldn't it be redundant to backup all the > instances? > > -JZ > > P.S. I'm working on a ZooKeeper instance manager that will have > backup/restore and a bunch of other stuff. We'll be open sourcing it. If > anyone is interested in previewing it let me know. > > > On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: > > >Why would you limit to the leader? Wouldn't backing up any server (as > >long as it's active) be sufficient? If you search the list it's been > >discussed before, using Observers seemed like a reasonable option as > >well. > > > >Patrick > > > >On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman > ><[EMAIL PROTECTED]> wrote: > >> That's easy as the backup app is running on the same machine as the ZK > >> instance. I can use 'stat' to see if "my" instance is the leader. > >> > >> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: > >> > >>>You want to have to figure out who the leader is every time you want to > >>>take a backup? That would be the downside to this strategy I would > >>>think. > >>> > >>>C > >>> > >> >From my phone > >>>On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> > >>>wrote: > >>> > >>>> As a backup strategy, it seems I would only want to backup snapshots > >>>>from > >>>> the leader. Does that make sense? > >>>> > >>>> -JZ > >>>> > >>>> > >> > > > >
-
Re: BackupsNeha Narkhede 2012-01-17, 18:10
Jordan,
I'd be interested in previewing it. Let me know. Thanks, Neha On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote: > We'll be backing up to S3. Wouldn't it be redundant to backup all the > instances? > > -JZ > > P.S. I'm working on a ZooKeeper instance manager that will have > backup/restore and a bunch of other stuff. We'll be open sourcing it. If > anyone is interested in previewing it let me know. > > > On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: > >>Why would you limit to the leader? Wouldn't backing up any server (as >>long as it's active) be sufficient? If you search the list it's been >>discussed before, using Observers seemed like a reasonable option as >>well. >> >>Patrick >> >>On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman >><[EMAIL PROTECTED]> wrote: >>> That's easy as the backup app is running on the same machine as the ZK >>> instance. I can use 'stat' to see if "my" instance is the leader. >>> >>> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: >>> >>>>You want to have to figure out who the leader is every time you want to >>>>take a backup? That would be the downside to this strategy I would >>>>think. >>>> >>>>C >>>> >>> >From my phone >>>>On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> >>>>wrote: >>>> >>>>> As a backup strategy, it seems I would only want to backup snapshots >>>>>from >>>>> the leader. Does that make sense? >>>>> >>>>> -JZ >>>>> >>>>> >>> >> >
-
Re: BackupsJordan Zimmerman 2012-01-17, 19:06
OK - I'll give you access to the repo as soon as it's in a reasonable
state. On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> wrote: >Jordan, > >I'd be interested in previewing it. Let me know. > >Thanks, >Neha > >On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman ><[EMAIL PROTECTED]> wrote: >> We'll be backing up to S3. Wouldn't it be redundant to backup all the >> instances? >> >> -JZ >> >> P.S. I'm working on a ZooKeeper instance manager that will have >> backup/restore and a bunch of other stuff. We'll be open sourcing it. If >> anyone is interested in previewing it let me know. >> >> >> On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >> >>>Why would you limit to the leader? Wouldn't backing up any server (as >>>long as it's active) be sufficient? If you search the list it's been >>>discussed before, using Observers seemed like a reasonable option as >>>well. >>> >>>Patrick >>> >>>On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman >>><[EMAIL PROTECTED]> wrote: >>>> That's easy as the backup app is running on the same machine as the ZK >>>> instance. I can use 'stat' to see if "my" instance is the leader. >>>> >>>> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: >>>> >>>>>You want to have to figure out who the leader is every time you want >>>>>to >>>>>take a backup? That would be the downside to this strategy I would >>>>>think. >>>>> >>>>>C >>>>> >>>> >From my phone >>>>>On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> >>>>>wrote: >>>>> >>>>>> As a backup strategy, it seems I would only want to backup snapshots >>>>>>from >>>>>> the leader. Does that make sense? >>>>>> >>>>>> -JZ >>>>>> >>>>>> >>>> >>> >> >
-
Re: BackupsJordan Zimmerman 2012-01-18, 00:38
Neha - can you send me your email address. Send it to:
[EMAIL PROTECTED] On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> wrote: >Jordan, > >I'd be interested in previewing it. Let me know. > >Thanks, >Neha > >On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman ><[EMAIL PROTECTED]> wrote: >> We'll be backing up to S3. Wouldn't it be redundant to backup all the >> instances? >> >> -JZ >> >> P.S. I'm working on a ZooKeeper instance manager that will have >> backup/restore and a bunch of other stuff. We'll be open sourcing it. If >> anyone is interested in previewing it let me know. >> >> >> On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >> >>>Why would you limit to the leader? Wouldn't backing up any server (as >>>long as it's active) be sufficient? If you search the list it's been >>>discussed before, using Observers seemed like a reasonable option as >>>well. >>> >>>Patrick >>> >>>On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman >>><[EMAIL PROTECTED]> wrote: >>>> That's easy as the backup app is running on the same machine as the ZK >>>> instance. I can use 'stat' to see if "my" instance is the leader. >>>> >>>> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: >>>> >>>>>You want to have to figure out who the leader is every time you want >>>>>to >>>>>take a backup? That would be the downside to this strategy I would >>>>>think. >>>>> >>>>>C >>>>> >>>> >From my phone >>>>>On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]> >>>>>wrote: >>>>> >>>>>> As a backup strategy, it seems I would only want to backup snapshots >>>>>>from >>>>>> the leader. Does that make sense? >>>>>> >>>>>> -JZ >>>>>> >>>>>> >>>> >>> >> >
-
Re: BackupsFlavio Junqueira 2012-01-19, 11:07
Since you started this thread, I've been thinking about the idea of
backing up, and I'm not sure I understand the motivation and if it is ok to violate safety properties. Given that ZooKeeper is used for coordination, I would think that in many cases all its state can be reconstructed in an algorithmic manner. Perhaps the use case for a backup would be the one in which it is being used as a database, for example, to keep the metadata of a file system. Periodic backups or even keeping an observer, however, won't guarantee that if you bring the system up using that backup you'll have all committed operations. The state of the leader reflects all committed operations, but one needs to have the latest state of the transaction log to not miss an update. But, it is true that I'm assuming that you can't miss updates. If you can miss updates, then that's a different story. By missing updates we'll be violating durability, which is a property that ZooKeeper is supposed to provide, so I'm trying to understand in which cases violating durability would be acceptable. If it is not acceptable and you still want to have a backup, then I don't see a way other than shutting down the clients before you take a backup, which doesn't seem to be what is being proposed here. -Flavio On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote: > Neha - can you send me your email address. Send it to: > [EMAIL PROTECTED] > > On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> wrote: > >> Jordan, >> >> I'd be interested in previewing it. Let me know. >> >> Thanks, >> Neha >> >> On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman >> <[EMAIL PROTECTED]> wrote: >>> We'll be backing up to S3. Wouldn't it be redundant to backup all >>> the >>> instances? >>> >>> -JZ >>> >>> P.S. I'm working on a ZooKeeper instance manager that will have >>> backup/restore and a bunch of other stuff. We'll be open sourcing >>> it. If >>> anyone is interested in previewing it let me know. >>> >>> >>> On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >>> >>>> Why would you limit to the leader? Wouldn't backing up any server >>>> (as >>>> long as it's active) be sufficient? If you search the list it's >>>> been >>>> discussed before, using Observers seemed like a reasonable option >>>> as >>>> well. >>>> >>>> Patrick >>>> >>>> On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman >>>> <[EMAIL PROTECTED]> wrote: >>>>> That's easy as the backup app is running on the same machine as >>>>> the ZK >>>>> instance. I can use 'stat' to see if "my" instance is the leader. >>>>> >>>>> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> You want to have to figure out who the leader is every time you >>>>>> want >>>>>> to >>>>>> take a backup? That would be the downside to this strategy I >>>>>> would >>>>>> think. >>>>>> >>>>>> C >>>>>> >>>>>> From my phone >>>>>> On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" <[EMAIL PROTECTED] >>>>>> > >>>>>> wrote: >>>>>> >>>>>>> As a backup strategy, it seems I would only want to backup >>>>>>> snapshots >>>>>>> from >>>>>>> the leader. Does that make sense? >>>>>>> >>>>>>> -JZ >>>>>>> >>>>>>> >>>>> >>>> >>> >> > flavio junqueira research scientist [EMAIL PROTECTED] direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
-
Re: BackupsJordan Zimmerman 2012-01-19, 17:32
I think you've made some vital points here, Flavio. ZK is mostly used for
coordination but it can also be used for sequential number generation, meta data storage (as you mention), etc. The thing that I overlooked is that it's only this data that is a backup candidate. i.e., ZK Paths used for locks, leaders, etc. should _not_ be backed up. I'm going to re-think my backup strategy. One idea is to backup certain specified ZK Paths (anything used for meta data). These "backups" could be done by using the ZK API to read the nodes/data and storing it somewhere. A restore, then, is just a re-write of that stored data. A ZK 3.4.4 transaction could be used to ensure atomicity. -JZ On 1/19/12 3:07 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote: >Since you started this thread, I've been thinking about the idea of >backing up, and I'm not sure I understand the motivation and if it is >ok to violate safety properties. > >Given that ZooKeeper is used for coordination, I would think that in >many cases all its state can be reconstructed in an algorithmic >manner. Perhaps the use case for a backup would be the one in which it >is being used as a database, for example, to keep the metadata of a >file system. Periodic backups or even keeping an observer, however, >won't guarantee that if you bring the system up using that backup >you'll have all committed operations. The state of the leader reflects >all committed operations, but one needs to have the latest state of >the transaction log to not miss an update. > >But, it is true that I'm assuming that you can't miss updates. If you >can miss updates, then that's a different story. By missing updates >we'll be violating durability, which is a property that ZooKeeper is >supposed to provide, so I'm trying to understand in which cases >violating durability would be acceptable. If it is not acceptable and >you still want to have a backup, then I don't see a way other than >shutting down the clients before you take a backup, which doesn't seem >to be what is being proposed here. > >-Flavio > > >On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote: > >> Neha - can you send me your email address. Send it to: >> [EMAIL PROTECTED] >> >> On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> wrote: >> >>> Jordan, >>> >>> I'd be interested in previewing it. Let me know. >>> >>> Thanks, >>> Neha >>> >>> On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman >>> <[EMAIL PROTECTED]> wrote: >>>> We'll be backing up to S3. Wouldn't it be redundant to backup all >>>> the >>>> instances? >>>> >>>> -JZ >>>> >>>> P.S. I'm working on a ZooKeeper instance manager that will have >>>> backup/restore and a bunch of other stuff. We'll be open sourcing >>>> it. If >>>> anyone is interested in previewing it let me know. >>>> >>>> >>>> On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >>>> >>>>> Why would you limit to the leader? Wouldn't backing up any server >>>>> (as >>>>> long as it's active) be sufficient? If you search the list it's >>>>> been >>>>> discussed before, using Observers seemed like a reasonable option >>>>> as >>>>> well. >>>>> >>>>> Patrick >>>>> >>>>> On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> That's easy as the backup app is running on the same machine as >>>>>> the ZK >>>>>> instance. I can use 'stat' to see if "my" instance is the leader. >>>>>> >>>>>> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>>> You want to have to figure out who the leader is every time you >>>>>>> want >>>>>>> to >>>>>>> take a backup? That would be the downside to this strategy I >>>>>>> would >>>>>>> think. >>>>>>> >>>>>>> C >>>>>>> >>>>>>> From my phone >>>>>>> On Jan 13, 2012 5:24 PM, "Jordan Zimmerman" >>>>>>><[EMAIL PROTECTED] >>>>>>> > >>>>>>> wrote: >>>>>>> >>>>>>>> As a backup strategy, it seems I would only want to backup >>>>>>>> snapshots >>>>>>>> from >>>>>>>> the leader. Does that make sense?
-
Re: BackupsTed Dunning 2012-01-19, 18:11
A backup can still be useful. It is a common property that a database
backup is known to be slightly out of date. Such a backup can still be very useful. In many systems, the most common cause of error is simple human intervention. This especially applies to file systems and databases, but can still apply to ZK if an admin carelessly tries to clean up part of the namespace and accidentally cleans up all of it. This should be much less common with ZK because manual adjustments are so much less a part of standard operation, but they can still occur. In these cases, an out-of-date backup may be enormously valuable. If somebody wants a precise backup from a particular moment in time, the best option is to use the snapshot capabilities exposed by various file systems. Traditional NAS vendors all support this. At a lower cost and complexity point, you can get this from MapR clusters exposed as NFS or by a ZFS file system. This option also allows you to keep multiple snapshots from points in the past. What Jordan is doing would allow backups without special storage devices and, with good backup of the log, would allow nearly current recovery in the event of catastrophic loss. Yes, this loses some durability, but it is still very desirable. On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <[EMAIL PROTECTED]>wrote: > Since you started this thread, I've been thinking about the idea of > backing up, and I'm not sure I understand the motivation and if it is ok to > violate safety properties. > > Given that ZooKeeper is used for coordination, I would think that in many > cases all its state can be reconstructed in an algorithmic manner. Perhaps > the use case for a backup would be the one in which it is being used as a > database, for example, to keep the metadata of a file system. Periodic > backups or even keeping an observer, however, won't guarantee that if you > bring the system up using that backup you'll have all committed operations. > The state of the leader reflects all committed operations, but one needs to > have the latest state of the transaction log to not miss an update. > > But, it is true that I'm assuming that you can't miss updates. If you can > miss updates, then that's a different story. By missing updates we'll be > violating durability, which is a property that ZooKeeper is supposed to > provide, so I'm trying to understand in which cases violating durability > would be acceptable. If it is not acceptable and you still want to have a > backup, then I don't see a way other than shutting down the clients before > you take a backup, which doesn't seem to be what is being proposed here. > > -Flavio > > > On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote: > > Neha - can you send me your email address. Send it to: >> [EMAIL PROTECTED] >> >> On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> wrote: >> >> Jordan, >>> >>> I'd be interested in previewing it. Let me know. >>> >>> Thanks, >>> Neha >>> >>> On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman >>> <[EMAIL PROTECTED]> wrote: >>> >>>> We'll be backing up to S3. Wouldn't it be redundant to backup all the >>>> instances? >>>> >>>> -JZ >>>> >>>> P.S. I'm working on a ZooKeeper instance manager that will have >>>> backup/restore and a bunch of other stuff. We'll be open sourcing it. If >>>> anyone is interested in previewing it let me know. >>>> >>>> >>>> On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: >>>> >>>> Why would you limit to the leader? Wouldn't backing up any server (as >>>>> long as it's active) be sufficient? If you search the list it's been >>>>> discussed before, using Observers seemed like a reasonable option as >>>>> well. >>>>> >>>>> Patrick >>>>> >>>>> On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman >>>>> <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> That's easy as the backup app is running on the same machine as the ZK >>>>>> instance. I can use 'stat' to see if "my" instance is the leader. >>>>>
-
Re: BackupsJordan Zimmerman 2012-01-19, 18:16
Ted - are you referring to my original plan to backup the transaction logs
or the new idea of backing up certain nodes? -JZ On 1/19/12 10:11 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: >What Jordan is doing would allow backups without special storage devices >and, with good backup of the log, would allow nearly current recovery in >the event of catastrophic loss. Yes, this loses some durability, but it >is >still very desirable.
-
Re: BackupsTed Dunning 2012-01-19, 18:23
Backing up the transaction logs.
On Thu, Jan 19, 2012 at 6:16 PM, Jordan Zimmerman <[EMAIL PROTECTED]>wrote: > Ted - are you referring to my original plan to backup the transaction logs > or the new idea of backing up certain nodes? > > -JZ > > On 1/19/12 10:11 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > > >What Jordan is doing would allow backups without special storage devices > >and, with good backup of the log, would allow nearly current recovery in > >the event of catastrophic loss. Yes, this loses some durability, but it > >is > >still very desirable. > >
-
Re: BackupsPatrick Hunt 2012-01-19, 18:24
I don't think we have a "backup" story today. "copy the datadir" is
not a great story. You could for example get snaps/txnlogs that are only partially written. Now this is fine from the perspective that a ZK server can recover from that, but EOD it's pretty ugly. Also requires you to copy the entire datadir, and not just the most recent "known good" snap/txnlog file(s). Not to mention the issue we talked about before - you're getting a copy from some unknown point in time, largely defined by how up to date the server is with the leader. It seems to me that if we really want to support backing up the servers we need a better story than this. Perhaps some tool which can ask a server to generate a "backup" (but only if it's reasonably up to date with the leader, most importantly that it's actually active in the ensemble, etc...), ensure that the file creation happened successfully (ie verify the output files), then copy that result, rather than the "copy the datadir" approach we have today. Patrick On Thu, Jan 19, 2012 at 10:16 AM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote: > Ted - are you referring to my original plan to backup the transaction logs > or the new idea of backing up certain nodes? > > -JZ > > On 1/19/12 10:11 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > >>What Jordan is doing would allow backups without special storage devices >>and, with good backup of the log, would allow nearly current recovery in >>the event of catastrophic loss. Yes, this loses some durability, but it >>is >>still very desirable. >
-
Re: BackupsFlavio Junqueira 2012-01-19, 18:39
Hi Ted, Znodes for leader election, group membership, etc, can all be
recreated, so why should I back them up instead of recreating the znodes? In fact, one might bring back a previous snapshot of the system that reflects an incorrect system state. In the case that one stores data that can't be recovered by other means, I understand the need, but then we have the durability problem that I mentioned and you apparently agreed. Also, ZooKeeper is a replicated service, so why can't you simply rely upon the replication strategy that ZooKeeper provides to you already? Again, I'm trying to understand the use cases here. Thanks, -Flavio On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote: > A backup can still be useful. It is a common property that a database > backup is known to be slightly out of date. > > Such a backup can still be very useful. In many systems, the most > common > cause of error is simple human intervention. This especially > applies to > file systems and databases, but can still apply to ZK if an admin > carelessly tries to clean up part of the namespace and accidentally > cleans > up all of it. This should be much less common with ZK because manual > adjustments are so much less a part of standard operation, but they > can > still occur. In these cases, an out-of-date backup may be enormously > valuable. > > If somebody wants a precise backup from a particular moment in time, > the > best option is to use the snapshot capabilities exposed by various > file > systems. Traditional NAS vendors all support this. At a lower cost > and > complexity point, you can get this from MapR clusters exposed as NFS > or by > a ZFS file system. This option also allows you to keep multiple > snapshots > from points in the past. > > What Jordan is doing would allow backups without special storage > devices > and, with good backup of the log, would allow nearly current > recovery in > the event of catastrophic loss. Yes, this loses some durability, > but it is > still very desirable. > > On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <fpj@yahoo- > inc.com>wrote: > >> Since you started this thread, I've been thinking about the idea of >> backing up, and I'm not sure I understand the motivation and if it >> is ok to >> violate safety properties. >> >> Given that ZooKeeper is used for coordination, I would think that >> in many >> cases all its state can be reconstructed in an algorithmic manner. >> Perhaps >> the use case for a backup would be the one in which it is being >> used as a >> database, for example, to keep the metadata of a file system. >> Periodic >> backups or even keeping an observer, however, won't guarantee that >> if you >> bring the system up using that backup you'll have all committed >> operations. >> The state of the leader reflects all committed operations, but one >> needs to >> have the latest state of the transaction log to not miss an update. >> >> But, it is true that I'm assuming that you can't miss updates. If >> you can >> miss updates, then that's a different story. By missing updates >> we'll be >> violating durability, which is a property that ZooKeeper is >> supposed to >> provide, so I'm trying to understand in which cases violating >> durability >> would be acceptable. If it is not acceptable and you still want to >> have a >> backup, then I don't see a way other than shutting down the clients >> before >> you take a backup, which doesn't seem to be what is being proposed >> here. >> >> -Flavio >> >> >> On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote: >> >> Neha - can you send me your email address. Send it to: >>> [EMAIL PROTECTED] >>> >>> On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> >>> wrote: >>> >>> Jordan, >>>> >>>> I'd be interested in previewing it. Let me know. >>>> >>>> Thanks, >>>> Neha >>>> >>>> On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman >>>> <[EMAIL PROTECTED]> wrote: flavio junqueira research scientist [EMAIL PROTECTED] direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
-
Re: BackupsJordan Zimmerman 2012-01-19, 19:07
It's that very replication that creates the need for backups. In there is
a user error or a bad injection of data, the error will quickly replicate to all the instances. There's no way to recover without an external backup. -JZ On 1/19/12 10:39 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote: >Hi Ted, Znodes for leader election, group membership, etc, can all be >recreated, so why should I back them up instead of recreating the >znodes? In fact, one might bring back a previous snapshot of the >system that reflects an incorrect system state. > >In the case that one stores data that can't be recovered by other >means, I understand the need, but then we have the durability problem >that I mentioned and you apparently agreed. Also, ZooKeeper is a >replicated service, so why can't you simply rely upon the replication >strategy that ZooKeeper provides to you already? Again, I'm trying to >understand the use cases here. > >Thanks, >-Flavio > >On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote: > >> A backup can still be useful. It is a common property that a database >> backup is known to be slightly out of date. >> >> Such a backup can still be very useful. In many systems, the most >> common >> cause of error is simple human intervention. This especially >> applies to >> file systems and databases, but can still apply to ZK if an admin >> carelessly tries to clean up part of the namespace and accidentally >> cleans >> up all of it. This should be much less common with ZK because manual >> adjustments are so much less a part of standard operation, but they >> can >> still occur. In these cases, an out-of-date backup may be enormously >> valuable. >> >> If somebody wants a precise backup from a particular moment in time, >> the >> best option is to use the snapshot capabilities exposed by various >> file >> systems. Traditional NAS vendors all support this. At a lower cost >> and >> complexity point, you can get this from MapR clusters exposed as NFS >> or by >> a ZFS file system. This option also allows you to keep multiple >> snapshots >> from points in the past. >> >> What Jordan is doing would allow backups without special storage >> devices >> and, with good backup of the log, would allow nearly current >> recovery in >> the event of catastrophic loss. Yes, this loses some durability, >> but it is >> still very desirable. >> >> On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <fpj@yahoo- >> inc.com>wrote: >> >>> Since you started this thread, I've been thinking about the idea of >>> backing up, and I'm not sure I understand the motivation and if it >>> is ok to >>> violate safety properties. >>> >>> Given that ZooKeeper is used for coordination, I would think that >>> in many >>> cases all its state can be reconstructed in an algorithmic manner. >>> Perhaps >>> the use case for a backup would be the one in which it is being >>> used as a >>> database, for example, to keep the metadata of a file system. >>> Periodic >>> backups or even keeping an observer, however, won't guarantee that >>> if you >>> bring the system up using that backup you'll have all committed >>> operations. >>> The state of the leader reflects all committed operations, but one >>> needs to >>> have the latest state of the transaction log to not miss an update. >>> >>> But, it is true that I'm assuming that you can't miss updates. If >>> you can >>> miss updates, then that's a different story. By missing updates >>> we'll be >>> violating durability, which is a property that ZooKeeper is >>> supposed to >>> provide, so I'm trying to understand in which cases violating >>> durability >>> would be acceptable. If it is not acceptable and you still want to >>> have a >>> backup, then I don't see a way other than shutting down the clients >>> before >>> you take a backup, which doesn't seem to be what is being proposed >>> here. >>> >>> -Flavio >>> >>> >>> On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote: >>> >>> Neha - can you send me your email address. Send it to:
-
Re: BackupsFlavio Junqueira 2012-01-19, 19:30
You're not talking about data corruption, are you? It is incorrect
data that has been introduced by a user or application by mistake. Am I getting it right? -Flavio On Jan 19, 2012, at 8:07 PM, Jordan Zimmerman wrote: > It's that very replication that creates the need for backups. In > there is > a user error or a bad injection of data, the error will quickly > replicate > to all the instances. There's no way to recover without an external > backup. > > > -JZ > > > On 1/19/12 10:39 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote: > >> Hi Ted, Znodes for leader election, group membership, etc, can all be >> recreated, so why should I back them up instead of recreating the >> znodes? In fact, one might bring back a previous snapshot of the >> system that reflects an incorrect system state. >> >> In the case that one stores data that can't be recovered by other >> means, I understand the need, but then we have the durability problem >> that I mentioned and you apparently agreed. Also, ZooKeeper is a >> replicated service, so why can't you simply rely upon the replication >> strategy that ZooKeeper provides to you already? Again, I'm trying to >> understand the use cases here. >> >> Thanks, >> -Flavio >> >> On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote: >> >>> A backup can still be useful. It is a common property that a >>> database >>> backup is known to be slightly out of date. >>> >>> Such a backup can still be very useful. In many systems, the most >>> common >>> cause of error is simple human intervention. This especially >>> applies to >>> file systems and databases, but can still apply to ZK if an admin >>> carelessly tries to clean up part of the namespace and accidentally >>> cleans >>> up all of it. This should be much less common with ZK because >>> manual >>> adjustments are so much less a part of standard operation, but they >>> can >>> still occur. In these cases, an out-of-date backup may be >>> enormously >>> valuable. >>> >>> If somebody wants a precise backup from a particular moment in time, >>> the >>> best option is to use the snapshot capabilities exposed by various >>> file >>> systems. Traditional NAS vendors all support this. At a lower cost >>> and >>> complexity point, you can get this from MapR clusters exposed as NFS >>> or by >>> a ZFS file system. This option also allows you to keep multiple >>> snapshots >>> from points in the past. >>> >>> What Jordan is doing would allow backups without special storage >>> devices >>> and, with good backup of the log, would allow nearly current >>> recovery in >>> the event of catastrophic loss. Yes, this loses some durability, >>> but it is >>> still very desirable. >>> >>> On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <fpj@yahoo- >>> inc.com>wrote: >>> >>>> Since you started this thread, I've been thinking about the idea of >>>> backing up, and I'm not sure I understand the motivation and if it >>>> is ok to >>>> violate safety properties. >>>> >>>> Given that ZooKeeper is used for coordination, I would think that >>>> in many >>>> cases all its state can be reconstructed in an algorithmic manner. >>>> Perhaps >>>> the use case for a backup would be the one in which it is being >>>> used as a >>>> database, for example, to keep the metadata of a file system. >>>> Periodic >>>> backups or even keeping an observer, however, won't guarantee that >>>> if you >>>> bring the system up using that backup you'll have all committed >>>> operations. >>>> The state of the leader reflects all committed operations, but one >>>> needs to >>>> have the latest state of the transaction log to not miss an update. >>>> >>>> But, it is true that I'm assuming that you can't miss updates. If >>>> you can >>>> miss updates, then that's a different story. By missing updates >>>> we'll be >>>> violating durability, which is a property that ZooKeeper is >>>> supposed to >>>> provide, so I'm trying to understand in which cases violating >>>> durability > flavio junqueira research scientist [EMAIL PROTECTED] direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
-
Re: BackupsJordan Zimmerman 2012-01-19, 19:32
Correct
On 1/19/12 11:30 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote: >You're not talking about data corruption, are you? It is incorrect >data that has been introduced by a user or application by mistake. Am >I getting it right? > >-Flavio > >On Jan 19, 2012, at 8:07 PM, Jordan Zimmerman wrote: > >> It's that very replication that creates the need for backups. In >> there is >> a user error or a bad injection of data, the error will quickly >> replicate >> to all the instances. There's no way to recover without an external >> backup. >> >> >> -JZ >> >> >> On 1/19/12 10:39 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote: >> >>> Hi Ted, Znodes for leader election, group membership, etc, can all be >>> recreated, so why should I back them up instead of recreating the >>> znodes? In fact, one might bring back a previous snapshot of the >>> system that reflects an incorrect system state. >>> >>> In the case that one stores data that can't be recovered by other >>> means, I understand the need, but then we have the durability problem >>> that I mentioned and you apparently agreed. Also, ZooKeeper is a >>> replicated service, so why can't you simply rely upon the replication >>> strategy that ZooKeeper provides to you already? Again, I'm trying to >>> understand the use cases here. >>> >>> Thanks, >>> -Flavio >>> >>> On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote: >>> >>>> A backup can still be useful. It is a common property that a >>>> database >>>> backup is known to be slightly out of date. >>>> >>>> Such a backup can still be very useful. In many systems, the most >>>> common >>>> cause of error is simple human intervention. This especially >>>> applies to >>>> file systems and databases, but can still apply to ZK if an admin >>>> carelessly tries to clean up part of the namespace and accidentally >>>> cleans >>>> up all of it. This should be much less common with ZK because >>>> manual >>>> adjustments are so much less a part of standard operation, but they >>>> can >>>> still occur. In these cases, an out-of-date backup may be >>>> enormously >>>> valuable. >>>> >>>> If somebody wants a precise backup from a particular moment in time, >>>> the >>>> best option is to use the snapshot capabilities exposed by various >>>> file >>>> systems. Traditional NAS vendors all support this. At a lower cost >>>> and >>>> complexity point, you can get this from MapR clusters exposed as NFS >>>> or by >>>> a ZFS file system. This option also allows you to keep multiple >>>> snapshots >>>> from points in the past. >>>> >>>> What Jordan is doing would allow backups without special storage >>>> devices >>>> and, with good backup of the log, would allow nearly current >>>> recovery in >>>> the event of catastrophic loss. Yes, this loses some durability, >>>> but it is >>>> still very desirable. >>>> >>>> On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <fpj@yahoo- >>>> inc.com>wrote: >>>> >>>>> Since you started this thread, I've been thinking about the idea of >>>>> backing up, and I'm not sure I understand the motivation and if it >>>>> is ok to >>>>> violate safety properties. >>>>> >>>>> Given that ZooKeeper is used for coordination, I would think that >>>>> in many >>>>> cases all its state can be reconstructed in an algorithmic manner. >>>>> Perhaps >>>>> the use case for a backup would be the one in which it is being >>>>> used as a >>>>> database, for example, to keep the metadata of a file system. >>>>> Periodic >>>>> backups or even keeping an observer, however, won't guarantee that >>>>> if you >>>>> bring the system up using that backup you'll have all committed >>>>> operations. >>>>> The state of the leader reflects all committed operations, but one >>>>> needs to >>>>> have the latest state of the transaction log to not miss an update. >>>>> >>>>> But, it is true that I'm assuming that you can't miss updates. If >>>>> you can >>>>> miss updates, then that's a different story. By missing updates >>>>> we'll be >
-
Re: BackupsTed Dunning 2012-01-19, 19:40
Well, "snap the datadir" is definitely one step better than "copy the
datadir", but your point is well taken that this isn't the strongest story. At least snaps take essentially zero time and are guaranteed to be consistent copies of files. Most snaps also are incremental which means that they tend to copy less than everything. Also, any time for a snapshot will be arbitrary in some sense. The problem that I see with writing a copy from the server is that we can't guarantee that all the values get written at the same time. If we had an ummutable table structure (I suggested this once, Thomas K suggested it also), then this wouldn't be a big deal since we would just make a snapshot of the table to write. With our current data structure, this isn't nice. It also isn't nice to increase the load on the server just to get a backup. On Thu, Jan 19, 2012 at 6:24 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > I don't think we have a "backup" story today. "copy the datadir" is > not a great story. You could for example get snaps/txnlogs that are > only partially written. Now this is fine from the perspective that a > ZK server can recover from that, but EOD it's pretty ugly. Also > requires you to copy the entire datadir, and not just the most recent > "known good" snap/txnlog file(s). Not to mention the issue we talked > about before - you're getting a copy from some unknown point in time, > largely defined by how up to date the server is with the leader. It > seems to me that if we really want to support backing up the servers > we need a better story than this. Perhaps some tool which can ask a > server to generate a "backup" (but only if it's reasonably up to date > with the leader, most importantly that it's actually active in the > ensemble, etc...), ensure that the file creation happened successfully > (ie verify the output files), then copy that result, rather than the > "copy the datadir" approach we have today. > > Patrick > > On Thu, Jan 19, 2012 at 10:16 AM, Jordan Zimmerman > <[EMAIL PROTECTED]> wrote: > > Ted - are you referring to my original plan to backup the transaction > logs > > or the new idea of backing up certain nodes? > > > > -JZ > > > > On 1/19/12 10:11 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > > > >>What Jordan is doing would allow backups without special storage devices > >>and, with good backup of the log, would allow nearly current recovery in > >>the event of catastrophic loss. Yes, this loses some durability, but it > >>is > >>still very desirable. > > >
-
Re: BackupsTed Dunning 2012-01-19, 19:41
Flavio,
Take as a use case the one where I am keeping configuration files in ZK. These will be manual installed and thus subject to manual error. Backups would be invaluable. On Thu, Jan 19, 2012 at 6:39 PM, Flavio Junqueira <[EMAIL PROTECTED]> wrote: > Hi Ted, Znodes for leader election, group membership, etc, can all be > recreated, so why should I back them up instead of recreating the znodes? > In fact, one might bring back a previous snapshot of the system that > reflects an incorrect system state. > > In the case that one stores data that can't be recovered by other means, I > understand the need, but then we have the durability problem that I > mentioned and you apparently agreed. Also, ZooKeeper is a replicated > service, so why can't you simply rely upon the replication strategy that > ZooKeeper provides to you already? Again, I'm trying to understand the use > cases here. > > Thanks, > -Flavio > > On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote: > > A backup can still be useful. It is a common property that a database >> backup is known to be slightly out of date. >> >> Such a backup can still be very useful. In many systems, the most common >> cause of error is simple human intervention. This especially applies to >> file systems and databases, but can still apply to ZK if an admin >> carelessly tries to clean up part of the namespace and accidentally cleans >> up all of it. This should be much less common with ZK because manual >> adjustments are so much less a part of standard operation, but they can >> still occur. In these cases, an out-of-date backup may be enormously >> valuable. >> >> If somebody wants a precise backup from a particular moment in time, the >> best option is to use the snapshot capabilities exposed by various file >> systems. Traditional NAS vendors all support this. At a lower cost and >> complexity point, you can get this from MapR clusters exposed as NFS or by >> a ZFS file system. This option also allows you to keep multiple snapshots >> from points in the past. >> >> >> What Jordan is doing would allow backups without special storage devices >> and, with good backup of the log, would allow nearly current recovery in >> the event of catastrophic loss. Yes, this loses some durability, but it >> is >> still very desirable. >> >> On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <[EMAIL PROTECTED] >> >wrote: >> >> Since you started this thread, I've been thinking about the idea of >>> backing up, and I'm not sure I understand the motivation and if it is ok >>> to >>> violate safety properties. >>> >>> Given that ZooKeeper is used for coordination, I would think that in many >>> cases all its state can be reconstructed in an algorithmic manner. >>> Perhaps >>> the use case for a backup would be the one in which it is being used as a >>> database, for example, to keep the metadata of a file system. Periodic >>> backups or even keeping an observer, however, won't guarantee that if you >>> bring the system up using that backup you'll have all committed >>> operations. >>> The state of the leader reflects all committed operations, but one needs >>> to >>> have the latest state of the transaction log to not miss an update. >>> >>> But, it is true that I'm assuming that you can't miss updates. If you can >>> miss updates, then that's a different story. By missing updates we'll be >>> violating durability, which is a property that ZooKeeper is supposed to >>> provide, so I'm trying to understand in which cases violating durability >>> would be acceptable. If it is not acceptable and you still want to have a >>> backup, then I don't see a way other than shutting down the clients >>> before >>> you take a backup, which doesn't seem to be what is being proposed here. >>> >>> -Flavio >>> >>> >>> On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote: >>> >>> Neha - can you send me your email address. Send it to: >>> >>>> [EMAIL PROTECTED] >>>> >>>> On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> wrote:
-
Re: BackupsTed Dunning 2012-01-19, 19:42
That is one important case. The offsite backup condition is probably well
handled by a listener. On Thu, Jan 19, 2012 at 7:30 PM, Flavio Junqueira <[EMAIL PROTECTED]> wrote: > You're not talking about data corruption, are you? It is incorrect data > that has been introduced by a user or application by mistake. Am I getting > it right? > > -Flavio > > > On Jan 19, 2012, at 8:07 PM, Jordan Zimmerman wrote: > > It's that very replication that creates the need for backups. In there is >> a user error or a bad injection of data, the error will quickly replicate >> to all the instances. There's no way to recover without an external >> backup. >> >> >> -JZ >> >> >> On 1/19/12 10:39 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote: >> >> Hi Ted, Znodes for leader election, group membership, etc, can all be >>> recreated, so why should I back them up instead of recreating the >>> znodes? In fact, one might bring back a previous snapshot of the >>> system that reflects an incorrect system state. >>> >>> In the case that one stores data that can't be recovered by other >>> means, I understand the need, but then we have the durability problem >>> that I mentioned and you apparently agreed. Also, ZooKeeper is a >>> replicated service, so why can't you simply rely upon the replication >>> strategy that ZooKeeper provides to you already? Again, I'm trying to >>> understand the use cases here. >>> >>> Thanks, >>> -Flavio >>> >>> On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote: >>> >>> A backup can still be useful. It is a common property that a database >>>> backup is known to be slightly out of date. >>>> >>>> Such a backup can still be very useful. In many systems, the most >>>> common >>>> cause of error is simple human intervention. This especially >>>> applies to >>>> file systems and databases, but can still apply to ZK if an admin >>>> carelessly tries to clean up part of the namespace and accidentally >>>> cleans >>>> up all of it. This should be much less common with ZK because manual >>>> adjustments are so much less a part of standard operation, but they >>>> can >>>> still occur. In these cases, an out-of-date backup may be enormously >>>> valuable. >>>> >>>> If somebody wants a precise backup from a particular moment in time, >>>> the >>>> best option is to use the snapshot capabilities exposed by various >>>> file >>>> systems. Traditional NAS vendors all support this. At a lower cost >>>> and >>>> complexity point, you can get this from MapR clusters exposed as NFS >>>> or by >>>> a ZFS file system. This option also allows you to keep multiple >>>> snapshots >>>> from points in the past. >>>> >>>> What Jordan is doing would allow backups without special storage >>>> devices >>>> and, with good backup of the log, would allow nearly current >>>> recovery in >>>> the event of catastrophic loss. Yes, this loses some durability, >>>> but it is >>>> still very desirable. >>>> >>>> On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <fpj@yahoo- >>>> inc.com>wrote: >>>> >>>> Since you started this thread, I've been thinking about the idea of >>>>> backing up, and I'm not sure I understand the motivation and if it >>>>> is ok to >>>>> violate safety properties. >>>>> >>>>> Given that ZooKeeper is used for coordination, I would think that >>>>> in many >>>>> cases all its state can be reconstructed in an algorithmic manner. >>>>> Perhaps >>>>> the use case for a backup would be the one in which it is being >>>>> used as a >>>>> database, for example, to keep the metadata of a file system. >>>>> Periodic >>>>> backups or even keeping an observer, however, won't guarantee that >>>>> if you >>>>> bring the system up using that backup you'll have all committed >>>>> operations. >>>>> The state of the leader reflects all committed operations, but one >>>>> needs to >>>>> have the latest state of the transaction log to not miss an update. >>>>> >>>>> But, it is true that I'm assuming that you can't miss updates. If >>>>> you can
-
Re: Backupskishore g 2012-01-20, 07:42
User error is a valid use case. Are we assuming that because of user error
the ZK is not usable at this point? if not, can some one please explain how having a back up can actually restore the data without bringing all zk servers down and not disrupting the clients. If we really want to take care of user error then what we need is probably a way to go back to the state just before the transaction that messed up ZK state. Can we not achieve this by providing a tool to generate snap and transaction log such that when the server is re-started it starts exactly from the transaction. We can do this by simply using the existing snapshot files and transaction logs from any of the servers. Do we really need a separate backup since the data is available on multiple servers. We need a way to generate a snap shot that will take us to the exact time ( either using timestamp or transaction number). One problem i see is probably zk cant go back in transaction number Thoughts? On Thu, Jan 19, 2012 at 11:42 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > That is one important case. The offsite backup condition is probably well > handled by a listener. > > On Thu, Jan 19, 2012 at 7:30 PM, Flavio Junqueira <[EMAIL PROTECTED]> > wrote: > > > You're not talking about data corruption, are you? It is incorrect data > > that has been introduced by a user or application by mistake. Am I > getting > > it right? > > > > -Flavio > > > > > > On Jan 19, 2012, at 8:07 PM, Jordan Zimmerman wrote: > > > > It's that very replication that creates the need for backups. In there > is > >> a user error or a bad injection of data, the error will quickly > replicate > >> to all the instances. There's no way to recover without an external > >> backup. > >> > >> > >> -JZ > >> > >> > >> On 1/19/12 10:39 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote: > >> > >> Hi Ted, Znodes for leader election, group membership, etc, can all be > >>> recreated, so why should I back them up instead of recreating the > >>> znodes? In fact, one might bring back a previous snapshot of the > >>> system that reflects an incorrect system state. > >>> > >>> In the case that one stores data that can't be recovered by other > >>> means, I understand the need, but then we have the durability problem > >>> that I mentioned and you apparently agreed. Also, ZooKeeper is a > >>> replicated service, so why can't you simply rely upon the replication > >>> strategy that ZooKeeper provides to you already? Again, I'm trying to > >>> understand the use cases here. > >>> > >>> Thanks, > >>> -Flavio > >>> > >>> On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote: > >>> > >>> A backup can still be useful. It is a common property that a database > >>>> backup is known to be slightly out of date. > >>>> > >>>> Such a backup can still be very useful. In many systems, the most > >>>> common > >>>> cause of error is simple human intervention. This especially > >>>> applies to > >>>> file systems and databases, but can still apply to ZK if an admin > >>>> carelessly tries to clean up part of the namespace and accidentally > >>>> cleans > >>>> up all of it. This should be much less common with ZK because manual > >>>> adjustments are so much less a part of standard operation, but they > >>>> can > >>>> still occur. In these cases, an out-of-date backup may be enormously > >>>> valuable. > >>>> > >>>> If somebody wants a precise backup from a particular moment in time, > >>>> the > >>>> best option is to use the snapshot capabilities exposed by various > >>>> file > >>>> systems. Traditional NAS vendors all support this. At a lower cost > >>>> and > >>>> complexity point, you can get this from MapR clusters exposed as NFS > >>>> or by > >>>> a ZFS file system. This option also allows you to keep multiple > >>>> snapshots > >>>> from points in the past. > >>>> > >>>> What Jordan is doing would allow backups without special storage > >>>> devices > >>>> and, with good backup of the log, would allow nearly current
-
Re: BackupsPatrick Hunt 2012-01-20, 17:01
On Thu, Jan 19, 2012 at 11:42 PM, kishore g <[EMAIL PROTECTED]> wrote:
> User error is a valid use case. Are we assuming that because of user error > the ZK is not usable at this point? if not, can some one please explain how > having a back up can actually restore the data without bringing all zk > servers down and not disrupting the clients. Not sure what you mean by "zk is not usable". User error meaning someone deleted/corrupted the cluster state at the server level and the service cannot be started/restarted? Or just that someone messed up the znodes via some user api operation? re "backup/restore ... not disrupting the clients" that's not possible via some backup type operation at the server level. The clients track the version of the service and will not come up on a server that's behind the zxid last seen by that client (ie no going back in time). Doing so would invalidate all kinds of zk guarantees. Anyway, if you're in this state you're in big trouble anyway. You can see this today, sometime ppl will start a zk cluster and some clients, then decide they want to wipe the znode data space, they do this by stopping the service, wiping the datadir, and restarting. They are then surprised when the clients (which are not restarted) cannot reconnect to the service. Patrick |