|
OZAWA Tsuyoshi
2011-09-23, 13:08
Andrew Purtell
2011-09-23, 16:16
OZAWA Tsuyoshi
2011-09-23, 16:45
Ted Dunning
2011-09-23, 20:52
Edward Capriolo
2011-09-23, 23:38
OZAWA Tsuyoshi
2011-09-24, 01:44
OZAWA Tsuyoshi
2011-09-24, 01:54
Ryan Rawson
2011-09-24, 02:09
OZAWA Tsuyoshi
2011-09-24, 05:38
Flavio Junqueira
2011-09-24, 21:43
OZAWA Tsuyoshi
2011-09-25, 07:02
OZAWA Tsuyoshi
2011-09-25, 08:14
Ted Dunning
2011-09-25, 11:14
Flavio Junqueira
2011-09-25, 21:49
Ted Dunning
2011-09-25, 22:26
Tsuyoshi OZAWA
2011-09-26, 02:18
Ted Dunning
2011-09-26, 11:56
Tsuyoshi OZAWA
2011-09-27, 02:19
Tsuyoshi OZAWA
2011-09-27, 02:30
|
-
[announce] Accord: A high-performance coordination service for write-intensive workloadsOZAWA Tsuyoshi 2011-09-23, 13:08
Hi,
Sending zookeeper-users and hbase-users ml since there may be some cluster developers interested in participating in this project there. I am pleased to announce the initial release of Accord, yet another coordination service like Apache ZooKeeper. ZooKeeper is a de facto standard coordination kernel as you know at present. Accord provides ZK-like features as a coordination service. Concretely speaking, it features: - Accord is a distributed, transactional, and fully-replicated (No SPoF) Key-Value Store with strong consistency. - Accord can be scale-out up to tens of nodes. - Accord servers can handle tens or thousands of clients. - The changes for a write request from a client can be notified to the other clients. - Accord detects events of client's joining/leaving, and notifies joined/left client information to the other clients. There are some problems in ZK, however, as follows: - ZK cannot handle write-intensive workloads well. ZK forwards all write requests to a master server. It may be bottleneck in write-intensive workload. - ZK is optimized for disk-persistence mode, not for in-memory mode. ZOOKEEPER-866 shows that ZK has the other bottleneck outside disk persistence, though there are some needs of a fully-replicated storage with both strong consistency and low latency. https://issues.apache.org/jira/browse/ZOOKEEPER-866 - Limited Transaction APIs. ZK can only issue write operations (write, del) in a transaction(multi-update). These restriction limit the capability of the coordination kernel. Accord solves such problems. 1. Accord uses Corosync Cluster Engine as a total-order messaging infrastructure instead of Zab, an atomic broadcast protocol ZK uses. The engine enable any servers to accept and process requests. 2. Accord supports in-memory mode. 3. More flexible transaction support. Not only write, del operations, but also cmp, copy, read operations are supported in transaction operation. These differences of the core engine (1, 2) enable us to avoid master bottleneck. Benchmark demonstrates that the write-operation throughput of Accord is much higher than one of ZooKeeper (up to 20 times better throughput at persistent mode, and up to 18 times better throughput at in-memory mode). The high performance kernel can extend the application ranges. Assumed applications are as follows, for instance : - Distributed Lock Manager whose lock operations occur at a high frequency from thousands of clients. I assume that the lock manager for Hbase in particluar. The coordination service enables HBase to update multiple rows with ACID properties. Hbase acts as distributed DB with ACID properties until the coordination service becomes the bottleneck. The new coordination kernel, Accord, can handle 18 times better throughput than ZK. As a result, Accord can dramatically improve the scalability of Hbase with ACID properties. - Metadata management service for large-scale distributed storage, including HDFS, Ceph and Sheepdog etc. Replicated-master can be implemented easily. - Replicated Message Queue or logger (For instance, replicated RabbitMQ). and so on. The other distributed systems can use Accord features easily because Accord provides general-purpose APIs (read/write/del/more flexible transaction). More information including getting started, benchmarks, and API docs are available from our project page : http://www.osrg.net/accord and all code is available from: http://github.com/collie/accord Please try it out, and let me know any opinions or problems. Best regards, OZAWA Tsuyoshi <[EMAIL PROTECTED]>
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsAndrew Purtell 2011-09-23, 16:16
Some code seems licensed under the GPLv2, some under the LGPL.
Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: OZAWA Tsuyoshi <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Cc: > Sent: Friday, September 23, 2011 6:08 AM > Subject: [announce] Accord: A high-performance coordination service for write-intensive workloads > > Hi, > > Sending zookeeper-users and hbase-users ml since there may be some > cluster developers interested in participating in this project there. > > I am pleased to announce the initial release of Accord, yet another > coordination service like Apache ZooKeeper. > ZooKeeper is a de facto standard coordination kernel as you know at present. > Accord provides ZK-like features as a coordination service. Concretely > speaking, it features: > - Accord is a distributed, transactional, and fully-replicated (No SPoF) > Key-Value Store with strong consistency. > - Accord can be scale-out up to tens of nodes. > - Accord servers can handle tens or thousands of clients. > - The changes for a write request from a client can be notified to the > other clients. > - Accord detects events of client's joining/leaving, and notifies > joined/left client information to the other clients. > > There are some problems in ZK, however, as follows: > - ZK cannot handle write-intensive workloads well. ZK forwards all write > requests to a master server. It may be bottleneck in write-intensive > workload. > - ZK is optimized for disk-persistence mode, not for in-memory mode. > ZOOKEEPER-866 shows that ZK has the other bottleneck outside disk > persistence, though there are some needs of a fully-replicated storage > with both strong consistency and low latency. > https://issues.apache.org/jira/browse/ZOOKEEPER-866 > - Limited Transaction APIs. ZK can only issue write operations (write, > del) in a transaction(multi-update). > > These restriction limit the capability of the coordination kernel. > Accord solves such problems. > 1. Accord uses Corosync Cluster Engine as a total-order messaging > infrastructure instead of Zab, an atomic broadcast protocol ZK uses. The > engine enable any servers to accept and process requests. > 2. Accord supports in-memory mode. > 3. More flexible transaction support. Not only write, del operations, > but also cmp, copy, read operations are supported in transaction operation. > > These differences of the core engine (1, 2) enable us to avoid master > bottleneck. Benchmark demonstrates that the write-operation throughput > of Accord is much higher than one of ZooKeeper > (up to 20 times better throughput at persistent mode, and up to 18 times > better throughput at in-memory mode). > > The high performance kernel can extend the application ranges. Assumed > applications are as follows, for instance : > - Distributed Lock Manager whose lock operations occur at a high > frequency from thousands of clients. > I assume that the lock manager for Hbase in particluar. The coordination > service enables HBase to update multiple rows with ACID properties. > Hbase acts as distributed DB with ACID properties until the coordination > service becomes the bottleneck. The new coordination kernel, Accord, can > handle 18 times better throughput than ZK. As a result, Accord can > dramatically improve the scalability of Hbase with ACID properties. > - Metadata management service for large-scale distributed storage, > including HDFS, Ceph and Sheepdog etc. > Replicated-master can be implemented easily. > - Replicated Message Queue or logger (For instance, replicated RabbitMQ). > and so on. > > The other distributed systems can use Accord features easily because > Accord provides general-purpose APIs (read/write/del/more flexible > transaction). > > More information including getting started, benchmarks, and API docs are > available from our project page : > http://www.osrg.net/accord
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsOZAWA Tsuyoshi 2011-09-23, 16:45
You are right.
Currently, client library (libacrd) is licensed under LGPL, and Server-side daemon code (conductor) is licensed under GPLv2. Best Regards, - OZAWA Tsuyoshi (2011/09/24 1:16), Andrew Purtell wrote: > Some code seems licensed under the GPLv2, some under the LGPL. > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > > ----- Original Message ----- >> From: OZAWA Tsuyoshi<[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED]; [EMAIL PROTECTED] >> Cc: >> Sent: Friday, September 23, 2011 6:08 AM >> Subject: [announce] Accord: A high-performance coordination service for write-intensive workloads >> >> Hi, >> >> Sending zookeeper-users and hbase-users ml since there may be some >> cluster developers interested in participating in this project there. >> >> I am pleased to announce the initial release of Accord, yet another >> coordination service like Apache ZooKeeper. >> ZooKeeper is a de facto standard coordination kernel as you know at present. >> Accord provides ZK-like features as a coordination service. Concretely >> speaking, it features: >> - Accord is a distributed, transactional, and fully-replicated (No SPoF) >> Key-Value Store with strong consistency. >> - Accord can be scale-out up to tens of nodes. >> - Accord servers can handle tens or thousands of clients. >> - The changes for a write request from a client can be notified to the >> other clients. >> - Accord detects events of client's joining/leaving, and notifies >> joined/left client information to the other clients. >> >> There are some problems in ZK, however, as follows: >> - ZK cannot handle write-intensive workloads well. ZK forwards all write >> requests to a master server. It may be bottleneck in write-intensive >> workload. >> - ZK is optimized for disk-persistence mode, not for in-memory mode. >> ZOOKEEPER-866 shows that ZK has the other bottleneck outside disk >> persistence, though there are some needs of a fully-replicated storage >> with both strong consistency and low latency. >> https://issues.apache.org/jira/browse/ZOOKEEPER-866 >> - Limited Transaction APIs. ZK can only issue write operations (write, >> del) in a transaction(multi-update). >> >> These restriction limit the capability of the coordination kernel. >> Accord solves such problems. >> 1. Accord uses Corosync Cluster Engine as a total-order messaging >> infrastructure instead of Zab, an atomic broadcast protocol ZK uses. The >> engine enable any servers to accept and process requests. >> 2. Accord supports in-memory mode. >> 3. More flexible transaction support. Not only write, del operations, >> but also cmp, copy, read operations are supported in transaction operation. >> >> These differences of the core engine (1, 2) enable us to avoid master >> bottleneck. Benchmark demonstrates that the write-operation throughput >> of Accord is much higher than one of ZooKeeper >> (up to 20 times better throughput at persistent mode, and up to 18 times >> better throughput at in-memory mode). >> >> The high performance kernel can extend the application ranges. Assumed >> applications are as follows, for instance : >> - Distributed Lock Manager whose lock operations occur at a high >> frequency from thousands of clients. >> I assume that the lock manager for Hbase in particluar. The coordination >> service enables HBase to update multiple rows with ACID properties. >> Hbase acts as distributed DB with ACID properties until the coordination >> service becomes the bottleneck. The new coordination kernel, Accord, can >> handle 18 times better throughput than ZK. As a result, Accord can >> dramatically improve the scalability of Hbase with ACID properties. >> - Metadata management service for large-scale distributed storage, >> including HDFS, Ceph and Sheepdog etc. >> Replicated-master can be implemented easily. >> - Replicated Message Queue or logger (For instance, replicated RabbitMQ). >> and so on.
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsTed Dunning 2011-09-23, 20:52
This is not correct. You can mix and match reads, writes and version checks
in a multi. 2011/9/23 OZAWA Tsuyoshi <[EMAIL PROTECTED]> > - Limited Transaction APIs. ZK can only issue write operations (write, > del) in a transaction(multi-update). >
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsEdward Capriolo 2011-09-23, 23:38
The cages library http://code.google.com/p/cages/ seems to be similar.
On Fri, Sep 23, 2011 at 4:52 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > This is not correct. You can mix and match reads, writes and version > checks > in a multi. > > 2011/9/23 OZAWA Tsuyoshi <[EMAIL PROTECTED]> > > > - Limited Transaction APIs. ZK can only issue write operations (write, > > del) in a transaction(multi-update). > > >
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsOZAWA Tsuyoshi 2011-09-24, 01:44
Thank you for your indication.
It means that Accord can reads multi nodes atomically in one transaction. Accord also support server-side compare operation(scmp) between path1 and path2 for restricting network traffic and latency. (2011/09/24 5:52), Ted Dunning wrote: > This is not correct. You can mix and match reads, writes and version checks > in a multi. > > 2011/9/23 OZAWA Tsuyoshi<[EMAIL PROTECTED]> > >> - Limited Transaction APIs. ZK can only issue write operations (write, >> del) in a transaction(multi-update). >> >
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsOZAWA Tsuyoshi 2011-09-24, 01:54
Cages is the library on the top of ZooKeeper.
Accord provides coordination service like ZooKeeper. Thus, Cage-like system can be implemented on the top of Accord. (2011/09/24 8:38), Edward Capriolo wrote: > The cages library http://code.google.com/p/cages/ seems to be similar. > > On Fri, Sep 23, 2011 at 4:52 PM, Ted Dunning<[EMAIL PROTECTED]> wrote: > >> This is not correct. You can mix and match reads, writes and version >> checks >> in a multi. >> >> 2011/9/23 OZAWA Tsuyoshi<[EMAIL PROTECTED]> >> >>> - Limited Transaction APIs. ZK can only issue write operations (write, >>> del) in a transaction(multi-update). >>> >> >
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsRyan Rawson 2011-09-24, 02:09
Did you guys run HBase with accord and see improved performance?
What other "hooks" can you tell us that would be worth the immense task of learning the ins and outs of a new distributed system? Performance is great, but you can hack around that, and HBase is not a heavy user of ZK. -ryan On Fri, Sep 23, 2011 at 6:54 PM, OZAWA Tsuyoshi <[EMAIL PROTECTED]> wrote: > Cages is the library on the top of ZooKeeper. > Accord provides coordination service like ZooKeeper. Thus, Cage-like system > can be implemented on the top of Accord. > > (2011/09/24 8:38), Edward Capriolo wrote: >> >> The cages library http://code.google.com/p/cages/ seems to be similar. >> >> On Fri, Sep 23, 2011 at 4:52 PM, Ted Dunning<[EMAIL PROTECTED]> >> wrote: >> >>> This is not correct. You can mix and match reads, writes and version >>> checks >>> in a multi. >>> >>> 2011/9/23 OZAWA Tsuyoshi<[EMAIL PROTECTED]> >>> >>>> - Limited Transaction APIs. ZK can only issue write operations (write, >>>> del) in a transaction(multi-update). >>>> >>> >> >
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsOZAWA Tsuyoshi 2011-09-24, 05:38
> Did you guys run HBase with accord and see improved performance?
Not yet, but I'm trying to scale the performance as the lock manager (such as cages). I'll report it. > What other "hooks" can you tell us that would be worth the immense > task of learning the ins and outs of a new distributed system? One solution is ZooKeeper compatible layer (like Proxy). It provide ZK compatible API and translates ZK protocol into Accord protocol. With the component, ZooKeeper users don't need to learn the new system and develop new libraries, while users can receive favors of the performance of Accord. How do you think about this idea? > Performance is great, but you can hack around that, and HBase is not a > heavy user of ZK. High throghput distributed lock manager can extend the application range of Hbase. Users of Hbase can issue multi-row transactions until DLM become bottleneck. (2011/09/24 11:09), Ryan Rawson wrote: > Did you guys run HBase with accord and see improved performance? > > What other "hooks" can you tell us that would be worth the immense > task of learning the ins and outs of a new distributed system? > Performance is great, but you can hack around that, and HBase is not a > heavy user of ZK. > > -ryan > > On Fri, Sep 23, 2011 at 6:54 PM, OZAWA Tsuyoshi > <[EMAIL PROTECTED]> wrote: >> Cages is the library on the top of ZooKeeper. >> Accord provides coordination service like ZooKeeper. Thus, Cage-like system >> can be implemented on the top of Accord. >> >> (2011/09/24 8:38), Edward Capriolo wrote: >>> >>> The cages library http://code.google.com/p/cages/ seems to be similar. >>> >>> On Fri, Sep 23, 2011 at 4:52 PM, Ted Dunning<[EMAIL PROTECTED]> >>> wrote: >>> >>>> This is not correct. You can mix and match reads, writes and version >>>> checks >>>> in a multi. >>>> >>>> 2011/9/23 OZAWA Tsuyoshi<[EMAIL PROTECTED]> >>>> >>>>> - Limited Transaction APIs. ZK can only issue write operations (write, >>>>> del) in a transaction(multi-update). >>>>> >>>> >>> >> > > > > -- 小沢 健史 NTT サイバースペース研究所 OSS コンピューティングプロジェクト 分散仮想コンピューティング技術グループ TEL 046-859-2351 FAX 046-855-1152 Email [EMAIL PROTECTED]
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsFlavio Junqueira 2011-09-24, 21:43
Thanks for sending this reference to the list, it sounds very
interesting. I have a few questions and comments, if you don't mind: 1- I was wondering if you can give more detail on the setup you used to generate the numbers you show in the graphs on your Accord page. The ZooKeeper values are way too low, and I suspect that you're using a single hard drive. It could be because you expect to use a single hard drive with an Accord server, and you wanted to make the comparison fair. Is this correct? 2- The previous observation leads me to the next question: could you say more about your use of disk with persistence on? 3- The limitation on the message size in ZooKeeper is not a fundamental limitation. We have chosen to limit for the reasons we explain in the wiki page that is linked in the Accord page. Do you have any particular use case in mind for which you think it would be useful to have very large messages? 4- If I understand the group communication substrate Accord uses, it enables Accord to process client requests in any server. ZooKeeper has a leader for a few reasons, one being the ability of managing client sessions. Ephemeral nodes, for example, are bound to sessions. Are there similar abstractions in Accord? If the answer is positive, could you explain it a bit? If not, is it doable with the substrate you're using? 5- I'm not sure where we say that 8 bytes is a typical value in the documentation. I actually remember writing in one of our papers that a typical value is around 1k bytes. -Flavio On Sep 23, 2011, at 4:22 PM, OZAWA Tsuyoshi wrote: > Hi, > > Sending zookeeper-users and hbase-users ml since there may be some > cluster developers interested in participating in this project there. > > I am pleased to announce the initial release of Accord, yet another > coordination service like Apache ZooKeeper. > ZooKeeper is a de facto standard coordination kernel as you know at > present. > Accord provides ZK-like features as a coordination service. Concretely > speaking, it features: > - Accord is a distributed, transactional, and fully-replicated (No > SPoF) > Key-Value Store with strong consistency. > - Accord can be scale-out up to tens of nodes. > - Accord servers can handle tens or thousands of clients. > - The changes for a write request from a client can be notified to the > other clients. > - Accord detects events of client's joining/leaving, and notifies > joined/left client information to the other clients. > > There are some problems in ZK, however, as follows: > - ZK cannot handle write-intensive workloads well. ZK forwards all > write > requests to a master server. It may be bottleneck in write-intensive > workload. > - ZK is optimized for disk-persistence mode, not for in-memory mode. > ZOOKEEPER-866 shows that ZK has the other bottleneck outside disk > persistence, though there are some needs of a fully-replicated storage > with both strong consistency and low latency. > https://issues.apache.org/jira/browse/ZOOKEEPER-866 > - Limited Transaction APIs. ZK can only issue write operations (write, > del) in a transaction(multi-update). > > These restriction limit the capability of the coordination kernel. > Accord solves such problems. > 1. Accord uses Corosync Cluster Engine as a total-order messaging > infrastructure instead of Zab, an atomic broadcast protocol ZK uses. > The > engine enable any servers to accept and process requests. > 2. Accord supports in-memory mode. > 3. More flexible transaction support. Not only write, del operations, > but also cmp, copy, read operations are supported in transaction > operation. > > These differences of the core engine (1, 2) enable us to avoid master > bottleneck. Benchmark demonstrates that the write-operation throughput > of Accord is much higher than one of ZooKeeper > (up to 20 times better throughput at persistent mode, and up to 18 > times > better throughput at in-memory mode). > > The high performance kernel can extend the application ranges. Assumed flavio junqueira research scientist [EMAIL PROTECTED] direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsOZAWA Tsuyoshi 2011-09-25, 07:02
(2011/09/25 6:43), Flavio Junqueira wrote:
> Thanks for sending this reference to the list, it sounds very > interesting. I have a few questions and comments, if you don't mind: > > 1- I was wondering if you can give more detail on the setup you used to > generate the numbers you show in the graphs on your Accord page. The > ZooKeeper values are way too low, and I suspect that you're using a > single hard drive. It could be because you expect to use a single hard > drive with an Accord server, and you wanted to make the comparison fair. > Is this correct? No, it isn't. Both ZooKeeper and Accord use the dedicated hard drive for logging. Setting file I used is here: https://gist.github.com/1240291 Please tell me if I have a mistake. > 2- The previous observation leads me to the next question: could you say > more about your use of disk with persistence on? ZooKeeper returns ACK after writing the disks of the over half machines. Accord returns ACK after writing the disk of just one machine, which accepted a request. However, at the same time, the ACK assures that all servers receive the messages in the same order. The difference of the semantics means that this measurement is not fair. I would like to measure the under fair situation, but not yet. If there are requests from users, I'm going to implement it and measure it. Note that the benchmark of in-memory is fair. > 3- The limitation on the message size in ZooKeeper is not a fundamental > limitation. We have chosen to limit for the reasons we explain in the > wiki page that is linked in the Accord page. Do you have any particular > use case in mind for which you think it would be useful to have very > large messages? Some developers use ZooKeeper as storage. For example, Onix developer, a implementation of open flow switch, says that : "for most the object size limitations of Zookeeper and convenience of accessing the configuration state directly through the NIB are a reason to favor the transactional database." http://www.usenix.org/event/osdi10/tech/full_papers/Koponen.pdf > 4- If I understand the group communication substrate Accord uses, it > enables Accord to process client requests in any server. ZooKeeper has a > leader for a few reasons, one being the ability of managing client > sessions. Ephemeral nodes, for example, are bound to sessions. Are there > similar abstractions in Accord? If the answer is positive, could you > explain it a bit? If not, is it doable with the substrate you're using? Yes, Accord has abstractions like Ephemeral nodes. We use Corosync cluster engine, which provides Virtual Synchrony semantics. It assures of having consensus of the message ordering and a server-failure ordering among all servers(conductor daemons). > 5- I'm not sure where we say that 8 bytes is a typical value in the > documentation. I actually remember writing in one of our papers that a > typical value is around 1k bytes. The benchmark assumes the lock. I'm going to measure various message sizes. I'll report it. Please ask me if you have more questions or opinions. - OZAWA Tsuyoshi > -Flavio > > On Sep 23, 2011, at 4:22 PM, OZAWA Tsuyoshi wrote: > >> Hi, >> >> Sending zookeeper-users and hbase-users ml since there may be some >> cluster developers interested in participating in this project there. >> >> I am pleased to announce the initial release of Accord, yet another >> coordination service like Apache ZooKeeper. >> ZooKeeper is a de facto standard coordination kernel as you know at >> present. >> Accord provides ZK-like features as a coordination service. Concretely >> speaking, it features: >> - Accord is a distributed, transactional, and fully-replicated (No SPoF) >> Key-Value Store with strong consistency. >> - Accord can be scale-out up to tens of nodes. >> - Accord servers can handle tens or thousands of clients. >> - The changes for a write request from a client can be notified to the >> other clients. >> - Accord detects events of client's joining/leaving, and notifies RabbitMQ).
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsOZAWA Tsuyoshi 2011-09-25, 08:14
I ran over the code of multi operation, however, I found only
write(create/putData/delete) and version check operation. In that case, it may be difficult that implement Compare-and-swap operation or multi read request with effect, because it has to return the value of old value or new value. If I have a misunderstand, please tell me. (2011/09/24 10:44), OZAWA Tsuyoshi wrote: > Thank you for your indication. > > It means that Accord can reads multi nodes atomically in one transaction. > > Accord also support server-side compare operation(scmp) between path1 > and path2 for restricting network traffic and latency. > > (2011/09/24 5:52), Ted Dunning wrote: >> This is not correct. You can mix and match reads, writes and version >> checks >> in a multi. >> >> 2011/9/23 OZAWA Tsuyoshi<[EMAIL PROTECTED]> >> >>> - Limited Transaction APIs. ZK can only issue write operations (write, >>> del) in a transaction(multi-update). >>> >>
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsTed Dunning 2011-09-25, 11:14
On Sun, Sep 25, 2011 at 12:02 AM, OZAWA Tsuyoshi <
[EMAIL PROTECTED]> wrote: > ... 1- I was wondering if you can give more detail on the setup you used to >> generate the numbers you show in the graphs on your Accord page. The >> ZooKeeper values are way too low, and I suspect that you're using a >> single hard drive. It could be because you expect to use a single hard >> drive with an Accord server, and you wanted to make the comparison fair. >> Is this correct? >> > > No, it isn't. > Both ZooKeeper and Accord use the dedicated hard drive for logging. > Zookeeper should have one hard drive for logging and one for snapshots to avoid seeks. > 2- The previous observation leads me to the next question: could you say >> more about your use of disk with persistence on? >> > > ZooKeeper returns ACK after writing the disks of the over half machines. > Accord returns ACK after writing the disk of just one machine, which > accepted a request. However, at the same time, the ACK assures that all > servers receive the messages in the same order. > It is a bit of an open question about just how hard one should push durability. I believe that Volt, for instance commits when enough servers confirm that they have queued up the log entry, but they don't wait for the logging to complete. Since the log writer can have very high throughput, this allows some very high throughput rates at the cost of some risk of regression if you lose power to all servers exactly simultaneously. Even with a blown circuit breaker, the power supply holdup time is commonly enough to flush a moderate amount of disk buffers (30ms or more). If you can stop committing instantly when power drops, it may be pretty safe. If you have any UPS with a power loss warning, then you are probably quite safe. If you are OK with in-order time slippage on distastrous power loss then you should be fine. The difference of the semantics means that this measurement is not fair. > I would like to measure the under fair situation, but not yet. If there are > requests from users, I'm going to implement it and measure it. Note that the > benchmark of in-memory is fair. The in-memory throughput for Zookeeper looks about like the disk version should look.
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsFlavio Junqueira 2011-09-25, 21:49
On Sep 25, 2011, at 9:02 AM, OZAWA Tsuyoshi wrote:
> (2011/09/25 6:43), Flavio Junqueira wrote: >> Thanks for sending this reference to the list, it sounds very >> interesting. I have a few questions and comments, if you don't mind: >> >> 1- I was wondering if you can give more detail on the setup you >> used to >> generate the numbers you show in the graphs on your Accord page. The >> ZooKeeper values are way too low, and I suspect that you're using a >> single hard drive. It could be because you expect to use a single >> hard >> drive with an Accord server, and you wanted to make the comparison >> fair. >> Is this correct? > > No, it isn't. > Both ZooKeeper and Accord use the dedicated hard drive for logging. > Setting file I used is here: > https://gist.github.com/1240291 > > Please tell me if I have a mistake. > I gave a cursory look, and I can't see any obvious problem. It is intriguing that the numbers are so low. Have you tried with different numbers of servers? I'm not sure if I just missed this information, but what version of ZooKeeper are you looking at? Also, if it is not too much trouble, could you please report on your read performance? >> 2- The previous observation leads me to the next question: could >> you say >> more about your use of disk with persistence on? > > ZooKeeper returns ACK after writing the disks of the over half > machines. > Accord returns ACK after writing the disk of just one machine, which > accepted a request. However, at the same time, the ACK assures that > all > servers receive the messages in the same order. > The difference of the semantics means that this measurement is not > fair. > I would like to measure the under fair situation, but not yet. If > there > are requests from users, I'm going to implement it and measure it. > Note > that the benchmark of in-memory is fair. > I'm not sure I understand this part. You say that an operation is ACKed after being written to one disk, but also that it is guaranteed to be delivered in the same order in all servers. Does it mean that Accord still replicates on other servers before ACKing but the other servers do not write to disk? Otherwise, the first server may crash and never come back, and the message cannot possibly be delivered by other servers. One question related to this point: with Accord, do you replicate the original request message or the result of operation? Do you guarantee that each server executes a request or applies the result of a request exactly once? If not, what kind of semantics does Accord provide? >> 3- The limitation on the message size in ZooKeeper is not a >> fundamental >> limitation. We have chosen to limit for the reasons we explain in the >> wiki page that is linked in the Accord page. Do you have any >> particular >> use case in mind for which you think it would be useful to have very >> large messages? > > Some developers use ZooKeeper as storage. For example, Onix > developer, a > implementation of open flow switch, says that : > "for most the object size limitations of > Zookeeper and convenience of accessing the configuration > state directly through the NIB are a reason to favor the > transactional database." > http://www.usenix.org/event/osdi10/tech/full_papers/Koponen.pdf > The comment in the paper is exactly right, we instruct our users to store metadata in ZooKeeper and data elsewhere. There are systems designed to store bulk data, and ZooKeeper shouldn't try to compete with such storage systems, it is not our goal. >> 4- If I understand the group communication substrate Accord uses, it >> enables Accord to process client requests in any server. ZooKeeper >> has a >> leader for a few reasons, one being the ability of managing client >> sessions. Ephemeral nodes, for example, are bound to sessions. Are >> there >> similar abstractions in Accord? If the answer is positive, could you >> explain it a bit? If not, is it doable with the substrate you're Good to know that you also support ephemerals. Could you say a little more about how you decide to eliminate an ephemeral node? I suppose that an ephemeral is bound to the client that created it somehow, and it is deleted if the client crashes or disconnects. What's the exact mechanism? Sounds good, thanks. -Flavio flavio junqueira research scientist [EMAIL PROTECTED] direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsTed Dunning 2011-09-25, 22:26
Also, what happens if the Accord cluster is split and then nodes are updated
in each half of the split brain? On Sun, Sep 25, 2011 at 2:49 PM, Flavio Junqueira <[EMAIL PROTECTED]> wrote: > ZooKeeper returns ACK after writing the disks of the over half machines. >> Accord returns ACK after writing the disk of just one machine, which >> accepted a request. However, at the same time, the ACK assures that all >> servers receive the messages in the same order. >> The difference of the semantics means that this measurement is not fair. >> I would like to measure the under fair situation, but not yet. If there >> are requests from users, I'm going to implement it and measure it. Note >> that the benchmark of in-memory is fair. >> >> > I'm not sure I understand this part. You say that an operation is ACKed > after being written to one disk, but also that it is guaranteed to be > delivered in the same order in all servers. Does it mean that Accord still > replicates on other servers before ACKing but the other servers do not write > to disk? Otherwise, the first server may crash and never come back, and the > message cannot possibly be delivered by other servers. > > One question related to this point: with Accord, do you replicate the > original request message or the result of operation? Do you guarantee that > each server executes a request or applies the result of a request exactly > once? If not, what kind of semantics does Accord provide?
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsTsuyoshi OZAWA 2011-09-26, 02:18
On Sun, Sep 25, 2011 at 8:14 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Zookeeper should have one hard drive for logging and one for snapshots to > avoid seeks. Yes, I used two HDD. The one is dedicated for logging, the other is dedicated for snapshot. >> 2- The previous observation leads me to the next question: could you say >>> more about your use of disk with persistence on? >>> >> ZooKeeper returns ACK after writing the disks of the over half machines. >> Accord returns ACK after writing the disk of just one machine, which >> accepted a request. However, at the same time, the ACK assures that all >> servers receive the messages in the same order. >> > > It is a bit of an open question about just how hard one should push > durability. I believe that Volt, for instance commits when enough servers > confirm that they have queued up the log entry, but they don't wait for the > logging to complete. Since the log writer can have very high throughput, > this allows some very high throughput rates at the cost of some risk of > regression if you lose power to all servers exactly simultaneously. Even > with a blown circuit breaker, the power supply holdup time is commonly > enough to flush a moderate amount of disk buffers (30ms or more). If you > can stop committing instantly when power drops, it may be pretty safe. If > you have any UPS with a power loss warning, then you are probably quite > safe. If you are OK with in-order time slippage on distastrous power loss > then you should be fine. Yeah, this is the tradeoff between the fault-tolerance and the performance. One proposal is the pluggable strorage layer for ZooKeeper. It works like MySQL pluggable storage layer. The users who needs fault-tolerance use the storage and messaging engine of ZooKeeper, while the users who needs the performance use these of Accord. The users of ZooKeeper can select a choice of the semantics for their use case by using this. > The difference of the semantics means that this measurement is not fair. >> I would like to measure the under fair situation, but not yet. If there are >> requests from users, I'm going to implement it and measure it. Note that the >> benchmark of in-memory is fair. > > The in-memory throughput for Zookeeper looks about like the disk version > should look. The benchmark is measured with ZooKeeper on /dev/shm device. Is there the implementation of ZooKpeer in-memory mode? If the answer is positive, I'll benchmark with it. -- OZAWA Tsuyoshi <[EMAIL PROTECTED]>
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsTed Dunning 2011-09-26, 11:56
I meant that the performance of ZK in your test using /dev/shm looks about
the same as ZK on a hard-disk should look. What file system is the log file and snapshot on? On Sun, Sep 25, 2011 at 7:18 PM, Tsuyoshi OZAWA <[EMAIL PROTECTED]>wrote: > The benchmark is measured with ZooKeeper on /dev/shm device. > Is there the implementation of ZooKpeer in-memory mode? > If the answer is positive, I'll benchmark with it. >
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsTsuyoshi OZAWA 2011-09-27, 02:19
On Mon, Sep 26, 2011 at 6:49 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:
> On Sep 25, 2011, at 9:02 AM, OZAWA Tsuyoshi wrote: > >> (2011/09/25 6:43), Flavio Junqueira wrote: >>> >>> Thanks for sending this reference to the list, it sounds very >>> interesting. I have a few questions and comments, if you don't mind: >>> >>> 1- I was wondering if you can give more detail on the setup you used to >>> generate the numbers you show in the graphs on your Accord page. The >>> ZooKeeper values are way too low, and I suspect that you're using a >>> single hard drive. It could be because you expect to use a single hard >>> drive with an Accord server, and you wanted to make the comparison fair. >>> Is this correct? >> >> No, it isn't. >> Both ZooKeeper and Accord use the dedicated hard drive for logging. >> Setting file I used is here: >> https://gist.github.com/1240291 >> >> Please tell me if I have a mistake. >> > > I gave a cursory look, and I can't see any obvious problem. It is intriguing > that the numbers are so low. Have you tried with different numbers of > servers? I'm not sure if I just missed this information, but what version of > ZooKeeper are you looking at? I use ZooKeeper 3.3.3, the latest release version, with 7200 rpm HDD. The benchmarks is measured with 3, 5, 7 servers. > Also, if it is not too much trouble, could you please report on your read performance? I'll measure it, and report it later. >>> 2- The previous observation leads me to the next question: could you say >>> more about your use of disk with persistence on? >> >> ZooKeeper returns ACK after writing the disks of the over half machines. >> Accord returns ACK after writing the disk of just one machine, which >> accepted a request. However, at the same time, the ACK assures that all >> servers receive the messages in the same order. >> The difference of the semantics means that this measurement is not fair. >> I would like to measure the under fair situation, but not yet. If there >> are requests from users, I'm going to implement it and measure it. Note >> that the benchmark of in-memory is fair. >> > > I'm not sure I understand this part. You say that an operation is ACKed > after being written to one disk, but also that it is guaranteed to be > delivered in the same order in all servers. Does it mean that Accord still > replicates on other servers before ACKing but the other servers do not write > to disk? Yes, you're right. > One question related to this point: with Accord, do you replicate the > original request message or the result of operation? Accord replicates request message. > Do you guarantee that each server executes a request or applies > the result of a request exactly once? Yes, it does. > The comment in the paper is exactly right, we instruct our users to store > metadata in ZooKeeper and data elsewhere. There are systems designed to > store bulk data, and ZooKeeper shouldn't try to compete with such storage > systems, it is not our goal. You're right. We decided to support unlimited data size, because some write-intensive applications such as replicated queue, or replicated database need to send big messages, though it can block next messages. In that way, I think that Accord may not compete with ZooKeeper, because the application ranges is different. ZooKeeper is suitable for read-mostly applications such as sharing configuration, leader election, and failure detection, while Accord is more suitable for write-mostly applications. >>> 4- If I understand the group communication substrate Accord uses, it >>> enables Accord to process client requests in any server. ZooKeeper has a >>> leader for a few reasons, one being the ability of managing client >>> sessions. Ephemeral nodes, for example, are bound to sessions. Are there >>> similar abstractions in Accord? If the answer is positive, could you >>> explain it a bit? If not, is it doable with the substrate you're using? >> >> Yes, Accord has abstractions like Ephemeral nodes. Accord currently provides more primitive abstractions. Accord servers assign client id when a client joins to Accord cluster, and notify client id of the left client when a client leaves. The clients mark down own id if it's ephemeral node when they tries to write. The other clients received a notification of leaving a client try to delete the nodes which the left client created. On Mon, Sep 26, 2011 at 6:49 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote: OZAWA Tsuyoshi <[EMAIL PROTECTED]>
-
Re: [announce] Accord: A high-performance coordination service for write-intensive workloadsTsuyoshi OZAWA 2011-09-27, 02:30
On Mon, Sep 26, 2011 at 8:56 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> I meant that the performance of ZK in your test using /dev/shm looks about > the same as ZK on a hard-disk should look. > > What file system is the log file and snapshot on? Both the log and snapshot files are on /dev/shm. ZooKeeper recorded 40k ops/sec. -- OZAWA Tsuyoshi <[EMAIL PROTECTED]> |