Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> sync vs. async vs. multi performances

Copy link to this message
Re: sync vs. async vs. multi performances
Hi Ariel, That wiki is stale. Check it here:


In particular check the HIC talk, slide 57. We were using 1k byte writes for those tests.


On Feb 15, 2012, at 12:18 AM, Ariel Weisberg wrote:

> Hi,
> I tried to look at the presentations on the wiki, but the links aren't
> working? I was using
> http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations and the
> error at the top of the page is "You are not allowed to do AttachFile on
> this page. Login and try again."
> I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
> http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower than
> 5. Is it possible to beat the rotation speed?
> You can increase the write size quite a bit to 240k and it only goes up to
> 10 milliseconds. http://pastebin.com/MSTwaHYN
> My recollection was being in the 12-14 range, but I may be thinking of when
> I was pushing throughput.
> Ariel
> On Tue, Feb 14, 2012 at 4:02 PM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:
>> Some of our previous measurements gave us around 5ms, check some of our
>> presentations we uploaded to the wiki. Those use 7.2k RPM disks and not
>> only volatile storage or battery backed cache. We do have the write cache
>> on for the numbers I'm referring to. There are also numbers there when the
>> write cache is off.
>> -Flavio
>> On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:
>>> Hi,
>>> It's only a minute of you process each region serially. Process 100 or
>> 1000
>>> in parallel and it will go a lot faster.
>>> 20 milliseconds to synchronously commit to a 5.4k disk is about right.
>> This
>>> is assuming the configuration for this is correct. On ext3 you need to
>>> mount with barrier=1 (ext4, xfs enable write barriers by default). If
>>> someone is getting significantly faster numbers they are probably writing
>>> to a volatile or battery backed cache.
>>> Performance is relative. The number of operations the DB can do is
>> roughly
>>> constant although multi may be able to more efficiently batch operations
>> by
>>> amortizing all the coordination overhead.
>>> In the synchronous case the DB is starved for work %99 of the time so it
>> is
>>> not surprising that it is slow. You are benchmarking round trip time in
>>> that case, and that is dominated by the time it takes to synchronously
>>> commmit something to disk.
>>> In the asynchronous case there is plenty of work and you can fully
>> utilize
>>> all the throughput available to get it done because each fsync makes
>>> multiple operations durable. However the work is still presented
>> piecemeal
>>> so there is per-operation overhead.
>>> Caveat, I am on 3.3.3 so I haven't read how multi operations are
>>> implemented, but the numbers you are getting bear this out. In the
>>> multi-case you are getting the benefit of keeping the DB fully utilized
>>> plus amortizing the coordination overhead across multiple operations so
>> you
>>> get a boost in throughput beyond just async.
>>> Ariel
>>> On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>> Thanks for the replies.
>>>> It's used when assigning the regions (kind of dataset) to the
>> regionserver
>>>> (jvm process in a physical server). There is one zookeeper node per
>> region.
>>>> On a server failure, there is typically a few hundreds regions to
>> reassign,
>>>> with multiple status written in . On paper, if we need 0,02s per node,
>> that
>>>> makes it to the minute to recover, just for zookeeper.
>>>> That's theory. I haven't done a precise measurement yet.
>>>> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
>>>> Cheers,
>>>> N.
>>>> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> These results are about what is expected although the might be a little

research scientist
direct +34 93-183-8828
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301