Flavio Junqueira 2012-06-18, 14:41
I haven't followed closely the developments of the linux kernel, but my understanding from reading blog posts here and there is that the implementation of the write barrier is not fully reliable, you may lose data. At the same time, it adds some significant performance overhead. Consequently, I would think that you're better off relying upon the BBWC and turning off the write barrier.
On Jun 16, 2012, at 6:05 PM, Raj N wrote:
> Mahadev, to answer your question,yes we get significantly better
> performance with forceSync=no. Infact Patrick is probably right. You
> probably ran the tests when the bug existed. It was one of my team members
> who raised the forceSync=no not working bug.
> Couple of more facts. We use ext4 filesystem (default options) on RHEL
> 2.6.18-238.el5(Notice its not el 6, so ext4 is back ported. ext3 is the
> default on el5). We use 500GB SAS drives with BBWC(1 GB, DWC disabled). But
> somehow still my performance with forceSync=yes is not the best. I have
> been thinking it might be because of the default options in ext4 which
> enables the barrier. The barrier essentially makes the BBWC useless. I
> think I can safely disable the barrier since I have BBWC. I haven't tried
> this out yet. But what do you guys think?
> On Sat, Jun 16, 2012 at 7:25 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:
>> There are some corner cases that could lead you to lose data depending on
>> your setting, even if forceSync is enabled. For example, if your disk write
>> cache is enabled, then there are some sequences of events that could lead
>> you to lose updates. With the disk write cache enabled, updates forced to
>> disk could be lost locally, and depending on how many copies exist across
>> servers, it may not be recovered.
>> Options I'm aware of to get around this are to use write barriers,
>> battery-backed raid controllers, or other solution that uses some form of
>> non-volatile memory. I must also say that I'm not aware of any such a case
>> happening with production use. We observed it in lab experiments, though.
>> On Jun 16, 2012, at 2:33 AM, Patrick Hunt wrote:
>>> On Fri, Jun 15, 2012 at 12:45 PM, Raj N <[EMAIL PROTECTED]> wrote:
>>>> Can zookeeper recover from a
>>>> corrupt transaction log using existing snapshots and then replaying
>>>> messages from its peers?
>>> A server will try to recover as best it can (using the snaps/logs it
>>> has available), and then talk to the other servers in the quorum to
>>> see if anyone else has a more recent committed change. In the case
>>> where it doesn't it will download what's necessary to get in sync with
>>> the new leader.
>>> What might have happened in your case is that you hit a bug, perhaps a
>>> type of corruption that we don't handle successfully. e.g. see