Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> DR policies/HA setup in production - best practices


+
Sergei Babovich 2010-12-22, 16:20
+
Mahadev Konar 2011-01-03, 20:05
+
Sergei Babovich 2011-01-03, 20:58
+
Ted Dunning 2011-01-03, 21:43
+
Mahadev Konar 2011-01-03, 22:31
Copy link to this message
-
Re: DR policies/HA setup in production - best practices
Thanks a lot! Really helped!
On 01/03/2011 05:31 PM, Mahadev Konar wrote:
> Sergei,
>   I think Ted already answered you question but in case you are interested in
> more details, please take a look at
>
> http://hadoop.apache.org/zookeeper/docs/r3.2.1/zookeeperInternals.html
>
> Thanks
> mahadev
>
>
> On 1/3/11 1:43 PM, "Ted Dunning"<[EMAIL PROTECTED]>  wrote:
>
>    
>> Actually, ZK is very good in this regard.
>>
>> The lifetime of a single leader is denoted by an epoch number.  Transactions
>> are identified by an epoch and a sequence number assigned by the leader.
>>   Since there is only one leader and because all transactions are executed
>> serially, this
>> combination of epoch and transaction id uniquely specifies a transaction and
>> provides a complete ordering.
>>
>> As transactions are committed, members of the committing quorum record the
>> latest epoch and transaction.
>>
>> When you restart a cluster, the members of the cluster negotiate to
>> determine who has the latest transaction and then start from there.  As
>> such, it is probably a good idea to backup more than just one log+snapshot
>> so that you have a better chance of having a later copy.
>>
>> On Mon, Jan 3, 2011 at 12:58 PM, Sergei Babovich
>> <[EMAIL PROTECTED]>wrote:
>>
>>      
>>> It is also understood about DR strategy. What is the mechanism for ZK to
>>> resolve conflicts in such case? Let's say we have a primitive backup
>>> strategy of shipping logs every hour. In theory it means (assuming the worst
>>> case) that on DR site all servers will have snapshots of the data made at
>>> different point in time. When I bring the DR cluster up what is a protocol
>>> of resolving inconsistencies? That was a reason of my question - it felt
>>> (may be naively) that recovering by replicating from the single node data
>>> (snapshot+log) would be safer and more consistent approach - it is easier to
>>> make guaranties about result.
>>>
>>>
>>>        
>>      
>    

This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof.