Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Backups


Copy link to this message
-
Re: Backups
Jordan Zimmerman 2012-01-19, 17:32
I think you've made some vital points here, Flavio. ZK is mostly used for
coordination but it can also be used for sequential number generation,
meta data storage (as you mention), etc. The thing that I overlooked is
that it's only this data that is a backup candidate. i.e., ZK Paths used
for locks, leaders, etc. should _not_ be backed up.

I'm going to re-think my backup strategy. One idea is to backup certain
specified ZK Paths (anything used for meta data). These "backups" could be
done by using the ZK API to read the nodes/data and storing it somewhere.
A restore, then, is just a re-write of that stored data. A ZK 3.4.4
transaction could be used to ensure atomicity.

-JZ

On 1/19/12 3:07 AM, "Flavio Junqueira" <[EMAIL PROTECTED]> wrote:

>Since you started this thread, I've been thinking about the idea of
>backing up, and I'm not sure I understand the motivation and if it is
>ok to violate safety properties.
>
>Given that ZooKeeper is used for coordination, I would think that in
>many cases all its state can be reconstructed in an algorithmic
>manner. Perhaps the use case for a backup would be the one in which it
>is being used as a database, for example, to keep the metadata of a
>file system. Periodic backups or even keeping an observer, however,
>won't guarantee that if you bring the system up using that backup
>you'll have all committed operations. The state of the leader reflects
>all committed operations, but one needs to have the latest state of
>the transaction log to not miss an update.
>
>But, it is true that I'm assuming that you can't miss updates. If you
>can miss updates, then that's a different story. By missing updates
>we'll be violating durability, which is  a property that ZooKeeper is
>supposed to provide, so I'm trying to understand in which cases
>violating durability would be acceptable. If it is not acceptable and
>you still want to have a backup, then I don't see a way other than
>shutting down the clients before you take a backup, which doesn't seem
>to be what is being proposed here.
>
>-Flavio
>
>
>On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote:
>
>> Neha - can you send me your email address. Send it to:
>> [EMAIL PROTECTED]
>>
>> On 1/17/12 10:10 AM, "Neha Narkhede" <[EMAIL PROTECTED]> wrote:
>>
>>> Jordan,
>>>
>>> I'd be interested in previewing it. Let me know.
>>>
>>> Thanks,
>>> Neha
>>>
>>> On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman
>>> <[EMAIL PROTECTED]> wrote:
>>>> We'll be backing up to S3. Wouldn't it be redundant to backup all
>>>> the
>>>> instances?
>>>>
>>>> -JZ
>>>>
>>>> P.S. I'm working on a ZooKeeper instance manager that will have
>>>> backup/restore and a bunch of other stuff. We'll be open sourcing
>>>> it. If
>>>> anyone is interested in previewing it let me know.
>>>>
>>>>
>>>> On 1/16/12 5:39 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Why would you limit to the leader? Wouldn't backing up any server
>>>>> (as
>>>>> long as it's active) be sufficient? If you search the list it's
>>>>> been
>>>>> discussed before, using Observers seemed like a reasonable option
>>>>> as
>>>>> well.
>>>>>
>>>>> Patrick
>>>>>
>>>>> On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman
>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>> That's easy as the backup app is running on the same machine as
>>>>>> the ZK
>>>>>> instance. I can use 'stat' to see if "my" instance is the leader.
>>>>>>
>>>>>> On 1/13/12 2:28 PM, "Camille Fournier" <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> You want to have to figure out who the leader is every time you
>>>>>>> want
>>>>>>> to
>>>>>>> take a backup? That would be the downside to this strategy I
>>>>>>> would
>>>>>>> think.
>>>>>>>
>>>>>>> C
>>>>>>>
>>>>>>> From my phone
>>>>>>> On Jan 13, 2012 5:24 PM, "Jordan Zimmerman"
>>>>>>><[EMAIL PROTECTED]
>>>>>>> >
>>>>>>> wrote:
>>>>>>>
>>>>>>>> As a backup strategy, it seems I would only want to backup
>>>>>>>> snapshots
>>>>>>>> from
>>>>>>>> the leader. Does that make sense?