Raj N 2012-06-14, 14:56
Jonathan Simms 2012-06-14, 17:03
Raj N 2012-06-14, 18:56
Patrick Hunt 2012-06-15, 18:17
Raj N 2012-06-15, 19:45
Patrick Hunt 2012-06-16, 00:33
There are some corner cases that could lead you to lose data depending on your setting, even if forceSync is enabled. For example, if your disk write cache is enabled, then there are some sequences of events that could lead you to lose updates. With the disk write cache enabled, updates forced to disk could be lost locally, and depending on how many copies exist across servers, it may not be recovered.
Options I'm aware of to get around this are to use write barriers, battery-backed raid controllers, or other solution that uses some form of non-volatile memory. I must also say that I'm not aware of any such a case happening with production use. We observed it in lab experiments, though.
On Jun 16, 2012, at 2:33 AM, Patrick Hunt wrote:
> On Fri, Jun 15, 2012 at 12:45 PM, Raj N <[EMAIL PROTECTED]> wrote:
>> Can zookeeper recover from a
>> corrupt transaction log using existing snapshots and then replaying
>> messages from its peers?
> A server will try to recover as best it can (using the snaps/logs it
> has available), and then talk to the other servers in the quorum to
> see if anyone else has a more recent committed change. In the case
> where it doesn't it will download what's necessary to get in sync with
> the new leader.
> What might have happened in your case is that you hit a bug, perhaps a
> type of corruption that we don't handle successfully. e.g. see
Raj N 2012-06-16, 16:05
Flavio Junqueira 2012-06-18, 14:41
Mahadev Konar 2012-06-15, 22:27
Patrick Hunt 2012-06-16, 00:27
Vitalii Tymchyshyn 2012-06-18, 12:55