|
Erik van Oosten
2011-10-30, 20:01
Jay Kreps
2011-10-30, 21:02
Erik van Oosten
2011-10-31, 08:23
Jun Rao
2011-10-31, 16:06
Erik van Oosten
2011-10-31, 16:21
Jun Rao
2011-10-31, 17:21
Chris Burroughs
2011-10-31, 20:38
Jay Kreps
2011-10-31, 21:00
Erik van Oosten
2011-11-01, 18:34
|
-
KAFKA-50 replication support and the DisruptorErik van Oosten 2011-10-30, 20:01
Hello,
The upcoming replication support (which we eagerly anticipate at my work) is a feature for which LMAX' disruptor is an ideal solution (http://code.google.com/p/disruptor/, Apache licensed). A colleague has in fact just started on a new replicating message broker based on it (https://github.com/cdegroot/underground). The disruptor itself is a super-performing in-jvm consumer/producer system. A consumer normally works in its own thread. The disruptor gets most of its speed because it is designed such that each consumer can continue working without releasing the CPU to the OS or other threads. In addition it is optimized for modern CPU architectures, for example by respecting the way the CPU cache works, and by avoiding all locking, CAS operations and even by keeping volatile read/writes to a minimum. Consumers may depend on work of other consumers. The disruptor will only offer new messages (in bulk if possible) when they were processed by preceding consumers. For Kafka-50 we can (for example) think of the following tasks: -a- get incoming new messages (the producer) -b- pre-processor (calculate checksum and offset) -c- write to journal -d- write to replica broker, wait for confirmation -e- notify consumers (no changes here) With the disrupter the main flow would be coded as: disruptor .handleEventsWith(preprocessor) .then(journaller, replicator) .then(notifier) Journaling and replicating of a message is thus executed in parallel. When this approach is considered, feel free to ask me about the disruptor. Hopefully I will also find some time to write some code a well. Kind regards, Erik. -- Erik van Oosten http://www.day-to-day-stuff.blogspot.com/
-
Re: KAFKA-50 replication support and the DisruptorJay Kreps 2011-10-30, 21:02
This is interesting. But wouldn't the cost of the I/O here (writing to log,
requests to slave nodes) completely dominate the cost of locks? -Jay On Sun, Oct 30, 2011 at 1:01 PM, Erik van Oosten <[EMAIL PROTECTED]>wrote: > Hello, > > The upcoming replication support (which we eagerly anticipate at my work) > is a feature for which LMAX' disruptor is an ideal solution ( > http://code.google.com/p/**disruptor/<http://code.google.com/p/disruptor/>, > Apache licensed). A colleague has in fact just started on a new replicating > message broker based on it (https://github.com/cdegroot/**underground<https://github.com/cdegroot/underground> > ). > > The disruptor itself is a super-performing in-jvm consumer/producer > system. A consumer normally works in its own thread. The disruptor gets > most of its speed because it is designed such that each consumer can > continue working without releasing the CPU to the OS or other threads. In > addition it is optimized for modern CPU architectures, for example by > respecting the way the CPU cache works, and by avoiding all locking, CAS > operations and even by keeping volatile read/writes to a minimum. > Consumers may depend on work of other consumers. The disruptor will only > offer new messages (in bulk if possible) when they were processed by > preceding consumers. > > For Kafka-50 we can (for example) think of the following tasks: > -a- get incoming new messages (the producer) > -b- pre-processor (calculate checksum and offset) > -c- write to journal > -d- write to replica broker, wait for confirmation > -e- notify consumers (no changes here) > > With the disrupter the main flow would be coded as: > > disruptor > .handleEventsWith(**preprocessor) > .then(journaller, replicator) > .then(notifier) > > Journaling and replicating of a message is thus executed in parallel. > > When this approach is considered, feel free to ask me about the disruptor. > Hopefully I will also find some time to write some code a well. > > Kind regards, > Erik. > > > -- > Erik van Oosten > http://www.day-to-day-stuff.**blogspot.com/<http://www.day-to-day-stuff.blogspot.com/> > > >
-
Re: KAFKA-50 replication support and the DisruptorErik van Oosten 2011-10-31, 08:23
> This is interesting. But wouldn't the cost of the I/O here (writing to log,
> requests to slave nodes) completely dominate the cost of locks? That is not the point (mostly). While you're waiting for a lock, you can't issue another IO request. Avoiding locking is worthwhile even if CPU is the bottleneck. The advantage is that you'll get lower latency and also important, less jitter. As you know, given the right hardware, sequential writes to disk are already very fast. If you jump through some hoops (e.g. avoid TCP, use user space IP stack) the same applies to the network. In a carefully coded async system, I am not convinced up front it would dominate locking overhead. In fact, what the LMAX guys found out is that the synchronization overhead of e.g. an ArrayBlockingQueue (the fastest queue they could find) completely dwarves any other CPU processing you might want to do. In their setup they process 6M messages per second on (by now) old hardware. That is including journalling, replicating to standby node, doing some financial transaction stuff and then sending a reply in lock step with the standby node. Kind regards, Erik. -- Erik van Oosten http://day-to-day-stuff.blogspot.com On 30 okt. 2011, at 22:02, Jay Kreps wrote: > This is interesting. But wouldn't the cost of the I/O here (writing to log, > requests to slave nodes) completely dominate the cost of locks? > > -Jay > > On Sun, Oct 30, 2011 at 1:01 PM, Erik van Oosten <[EMAIL PROTECTED]>wrote: > >> Hello, >> >> The upcoming replication support (which we eagerly anticipate at my work) >> is a feature for which LMAX' disruptor is an ideal solution ( >> http://code.google.com/p/**disruptor/<http://code.google.com/p/disruptor/>, >> Apache licensed). A colleague has in fact just started on a new replicating >> message broker based on it (https://github.com/cdegroot/**underground<https://github.com/cdegroot/underground> >> ). >> >> The disruptor itself is a super-performing in-jvm consumer/producer >> system. A consumer normally works in its own thread. The disruptor gets >> most of its speed because it is designed such that each consumer can >> continue working without releasing the CPU to the OS or other threads. In >> addition it is optimized for modern CPU architectures, for example by >> respecting the way the CPU cache works, and by avoiding all locking, CAS >> operations and even by keeping volatile read/writes to a minimum. >> Consumers may depend on work of other consumers. The disruptor will only >> offer new messages (in bulk if possible) when they were processed by >> preceding consumers. >> >> For Kafka-50 we can (for example) think of the following tasks: >> -a- get incoming new messages (the producer) >> -b- pre-processor (calculate checksum and offset) >> -c- write to journal >> -d- write to replica broker, wait for confirmation >> -e- notify consumers (no changes here) >> >> With the disrupter the main flow would be coded as: >> >> disruptor >> .handleEventsWith(**preprocessor) >> .then(journaller, replicator) >> .then(notifier) >> >> Journaling and replicating of a message is thus executed in parallel. >> >> When this approach is considered, feel free to ask me about the disruptor. >> Hopefully I will also find some time to write some code a well. >> >> Kind regards, >> Erik. >> >> >> -- >> Erik van Oosten >> http://www.day-to-day-stuff.**blogspot.com/<http://www.day-to-day-stuff.blogspot.com/> >> >> >>
-
Re: KAFKA-50 replication support and the DisruptorJun Rao 2011-10-31, 16:06
Erik,
Thanks for the pointer. This could be useful optimization. However, we probably don't want to optimize too early until we understand the use cases. Also, the replication logic itself is already non-trivial. Perhaps we can look into this after the first version of replication is done? Thanks, Jun On Mon, Oct 31, 2011 at 1:23 AM, Erik van Oosten <[EMAIL PROTECTED]>wrote: > > This is interesting. But wouldn't the cost of the I/O here (writing to > log, > > requests to slave nodes) completely dominate the cost of locks? > > > That is not the point (mostly). While you're waiting for a lock, you can't > issue another IO request. Avoiding locking is worthwhile even if CPU is the > bottleneck. The advantage is that you'll get lower latency and also > important, less jitter. > > As you know, given the right hardware, sequential writes to disk are > already very fast. If you jump through some hoops (e.g. avoid TCP, use user > space IP stack) the same applies to the network. In a carefully coded async > system, I am not convinced up front it would dominate locking overhead. > > In fact, what the LMAX guys found out is that the synchronization overhead > of e.g. an ArrayBlockingQueue (the fastest queue they could find) > completely dwarves any other CPU processing you might want to do. In their > setup they process 6M messages per second on (by now) old hardware. That is > including journalling, replicating to standby node, doing some financial > transaction stuff and then sending a reply in lock step with the standby > node. > > Kind regards, > Erik. > > -- > Erik van Oosten > http://day-to-day-stuff.blogspot.com > > On 30 okt. 2011, at 22:02, Jay Kreps wrote: > > > This is interesting. But wouldn't the cost of the I/O here (writing to > log, > > requests to slave nodes) completely dominate the cost of locks? > > > > -Jay > > > > On Sun, Oct 30, 2011 at 1:01 PM, Erik van Oosten <[EMAIL PROTECTED] > >wrote: > > > >> Hello, > >> > >> The upcoming replication support (which we eagerly anticipate at my > work) > >> is a feature for which LMAX' disruptor is an ideal solution ( > >> http://code.google.com/p/**disruptor/< > http://code.google.com/p/disruptor/>, > >> Apache licensed). A colleague has in fact just started on a new > replicating > >> message broker based on it (https://github.com/cdegroot/**underground< > https://github.com/cdegroot/underground> > >> ). > >> > >> The disruptor itself is a super-performing in-jvm consumer/producer > >> system. A consumer normally works in its own thread. The disruptor gets > >> most of its speed because it is designed such that each consumer can > >> continue working without releasing the CPU to the OS or other threads. > In > >> addition it is optimized for modern CPU architectures, for example by > >> respecting the way the CPU cache works, and by avoiding all locking, CAS > >> operations and even by keeping volatile read/writes to a minimum. > >> Consumers may depend on work of other consumers. The disruptor will only > >> offer new messages (in bulk if possible) when they were processed by > >> preceding consumers. > >> > >> For Kafka-50 we can (for example) think of the following tasks: > >> -a- get incoming new messages (the producer) > >> -b- pre-processor (calculate checksum and offset) > >> -c- write to journal > >> -d- write to replica broker, wait for confirmation > >> -e- notify consumers (no changes here) > >> > >> With the disrupter the main flow would be coded as: > >> > >> disruptor > >> .handleEventsWith(**preprocessor) > >> .then(journaller, replicator) > >> .then(notifier) > >> > >> Journaling and replicating of a message is thus executed in parallel. > >> > >> When this approach is considered, feel free to ask me about the > disruptor. > >> Hopefully I will also find some time to write some code a well. > >> > >> Kind regards, > >> Erik. > >> > >> > >> -- > >> Erik van Oosten > >> http://www.day-to-day-stuff.**blogspot.com/< > http://www.day-to-day-stuff.blogspot.com/>
-
Re: KAFKA-50 replication support and the DisruptorErik van Oosten 2011-10-31, 16:21
No problems.
You could first program all tasks serially and then convert to the use the disrupter later. With cleanly separated tasks this should be very easy to do. So easy in fact that it might not be worth the wait ;) BTW I respect the non-trivialness of multi-broker replication. That reminds me, is there an update of the proposal design documents? Kind regards, Erik. -- Erik van Oosten http://day-to-day-stuff.blogspot.com On 31 okt. 2011, at 17:06, Jun Rao wrote: > Erik, > > Thanks for the pointer. This could be useful optimization. However, we > probably don't want to optimize too early until we understand the use > cases. Also, the replication logic itself is already non-trivial. Perhaps > we can look into this after the first version of replication is done? > > Thanks, > > Jun > > On Mon, Oct 31, 2011 at 1:23 AM, Erik van Oosten <[EMAIL PROTECTED]>wrote: > >>> This is interesting. But wouldn't the cost of the I/O here (writing to >> log, >>> requests to slave nodes) completely dominate the cost of locks? >> >> >> That is not the point (mostly). While you're waiting for a lock, you can't >> issue another IO request. Avoiding locking is worthwhile even if CPU is the >> bottleneck. The advantage is that you'll get lower latency and also >> important, less jitter. >> >> As you know, given the right hardware, sequential writes to disk are >> already very fast. If you jump through some hoops (e.g. avoid TCP, use user >> space IP stack) the same applies to the network. In a carefully coded async >> system, I am not convinced up front it would dominate locking overhead. >> >> In fact, what the LMAX guys found out is that the synchronization overhead >> of e.g. an ArrayBlockingQueue (the fastest queue they could find) >> completely dwarves any other CPU processing you might want to do. In their >> setup they process 6M messages per second on (by now) old hardware. That is >> including journalling, replicating to standby node, doing some financial >> transaction stuff and then sending a reply in lock step with the standby >> node. >> >> Kind regards, >> Erik. >> >> -- >> Erik van Oosten >> http://day-to-day-stuff.blogspot.com >> >> On 30 okt. 2011, at 22:02, Jay Kreps wrote: >> >>> This is interesting. But wouldn't the cost of the I/O here (writing to >> log, >>> requests to slave nodes) completely dominate the cost of locks? >>> >>> -Jay >>> >>> On Sun, Oct 30, 2011 at 1:01 PM, Erik van Oosten <[EMAIL PROTECTED] >>> wrote: >>> >>>> Hello, >>>> >>>> The upcoming replication support (which we eagerly anticipate at my >> work) >>>> is a feature for which LMAX' disruptor is an ideal solution ( >>>> http://code.google.com/p/**disruptor/< >> http://code.google.com/p/disruptor/>, >>>> Apache licensed). A colleague has in fact just started on a new >> replicating >>>> message broker based on it (https://github.com/cdegroot/**underground< >> https://github.com/cdegroot/underground> >>>> ). >>>> >>>> The disruptor itself is a super-performing in-jvm consumer/producer >>>> system. A consumer normally works in its own thread. The disruptor gets >>>> most of its speed because it is designed such that each consumer can >>>> continue working without releasing the CPU to the OS or other threads. >> In >>>> addition it is optimized for modern CPU architectures, for example by >>>> respecting the way the CPU cache works, and by avoiding all locking, CAS >>>> operations and even by keeping volatile read/writes to a minimum. >>>> Consumers may depend on work of other consumers. The disruptor will only >>>> offer new messages (in bulk if possible) when they were processed by >>>> preceding consumers. >>>> >>>> For Kafka-50 we can (for example) think of the following tasks: >>>> -a- get incoming new messages (the producer) >>>> -b- pre-processor (calculate checksum and offset) >>>> -c- write to journal >>>> -d- write to replica broker, wait for confirmation >>>> -e- notify consumers (no changes here) >>>> >>>> With the disrupter the main flow would be coded as:
-
Re: KAFKA-50 replication support and the DisruptorJun Rao 2011-10-31, 17:21
Erik,
There have been some discussions on the jira, but the design doc has been updated. If you feel that something needs to be changed, feel free to comment on the jira. Thanks, Jun On Mon, Oct 31, 2011 at 9:21 AM, Erik van Oosten <[EMAIL PROTECTED]>wrote: > No problems. > > You could first program all tasks serially and then convert to the use the > disrupter later. With cleanly separated tasks this should be very easy to > do. So easy in fact that it might not be worth the wait ;) > > BTW I respect the non-trivialness of multi-broker replication. That > reminds me, is there an update of the proposal design documents? > > Kind regards, > Erik. > > -- > Erik van Oosten > http://day-to-day-stuff.blogspot.com > > On 31 okt. 2011, at 17:06, Jun Rao wrote: > > > Erik, > > > > Thanks for the pointer. This could be useful optimization. However, we > > probably don't want to optimize too early until we understand the use > > cases. Also, the replication logic itself is already non-trivial. Perhaps > > we can look into this after the first version of replication is done? > > > > Thanks, > > > > Jun > > > > On Mon, Oct 31, 2011 at 1:23 AM, Erik van Oosten <[EMAIL PROTECTED] > >wrote: > > > >>> This is interesting. But wouldn't the cost of the I/O here (writing to > >> log, > >>> requests to slave nodes) completely dominate the cost of locks? > >> > >> > >> That is not the point (mostly). While you're waiting for a lock, you > can't > >> issue another IO request. Avoiding locking is worthwhile even if CPU is > the > >> bottleneck. The advantage is that you'll get lower latency and also > >> important, less jitter. > >> > >> As you know, given the right hardware, sequential writes to disk are > >> already very fast. If you jump through some hoops (e.g. avoid TCP, use > user > >> space IP stack) the same applies to the network. In a carefully coded > async > >> system, I am not convinced up front it would dominate locking overhead. > >> > >> In fact, what the LMAX guys found out is that the synchronization > overhead > >> of e.g. an ArrayBlockingQueue (the fastest queue they could find) > >> completely dwarves any other CPU processing you might want to do. In > their > >> setup they process 6M messages per second on (by now) old hardware. > That is > >> including journalling, replicating to standby node, doing some financial > >> transaction stuff and then sending a reply in lock step with the standby > >> node. > >> > >> Kind regards, > >> Erik. > >> > >> -- > >> Erik van Oosten > >> http://day-to-day-stuff.blogspot.com > >> > >> On 30 okt. 2011, at 22:02, Jay Kreps wrote: > >> > >>> This is interesting. But wouldn't the cost of the I/O here (writing to > >> log, > >>> requests to slave nodes) completely dominate the cost of locks? > >>> > >>> -Jay > >>> > >>> On Sun, Oct 30, 2011 at 1:01 PM, Erik van Oosten <[EMAIL PROTECTED] > >>> wrote: > >>> > >>>> Hello, > >>>> > >>>> The upcoming replication support (which we eagerly anticipate at my > >> work) > >>>> is a feature for which LMAX' disruptor is an ideal solution ( > >>>> http://code.google.com/p/**disruptor/< > >> http://code.google.com/p/disruptor/>, > >>>> Apache licensed). A colleague has in fact just started on a new > >> replicating > >>>> message broker based on it (https://github.com/cdegroot/**underground > < > >> https://github.com/cdegroot/underground> > >>>> ). > >>>> > >>>> The disruptor itself is a super-performing in-jvm consumer/producer > >>>> system. A consumer normally works in its own thread. The disruptor > gets > >>>> most of its speed because it is designed such that each consumer can > >>>> continue working without releasing the CPU to the OS or other threads. > >> In > >>>> addition it is optimized for modern CPU architectures, for example by > >>>> respecting the way the CPU cache works, and by avoiding all locking, > CAS > >>>> operations and even by keeping volatile read/writes to a minimum. > >>>> Consumers may depend on work of other consumers. The disruptor will
-
Re: KAFKA-50 replication support and the DisruptorChris Burroughs 2011-10-31, 20:38
On 10/31/2011 04:23 AM, Erik van Oosten wrote:
> That is not the point (mostly). While you're waiting for a lock, you can't issue another IO request. Avoiding locking is worthwhile even if CPU is the bottleneck. The advantage is that you'll get lower latency and also important, less jitter. /begin{Tangent} Doesn't the Disruptor use a spin lock though? I would expect that to not play nice if sharing a core with CPU bound threads doing 'real' work.
-
Re: KAFKA-50 replication support and the DisruptorJay Kreps 2011-10-31, 21:00
Cool, makes sense.
I think the design doc is the current best thinking for replication. I believe the plan for the LinkedIn crew is to try to get the apache release out the door and then begin working on the various replication tickets for 0.8.x. I got a little ahead of myself and started on the async socket server which is actually i think closely related to the disrupter stuff, so i think I need to go understand that better. -Jay On Mon, Oct 31, 2011 at 9:21 AM, Erik van Oosten <[EMAIL PROTECTED]>wrote: > No problems. > > You could first program all tasks serially and then convert to the use the > disrupter later. With cleanly separated tasks this should be very easy to > do. So easy in fact that it might not be worth the wait ;) > > BTW I respect the non-trivialness of multi-broker replication. That > reminds me, is there an update of the proposal design documents? > > Kind regards, > Erik. > > -- > Erik van Oosten > http://day-to-day-stuff.blogspot.com > > On 31 okt. 2011, at 17:06, Jun Rao wrote: > > > Erik, > > > > Thanks for the pointer. This could be useful optimization. However, we > > probably don't want to optimize too early until we understand the use > > cases. Also, the replication logic itself is already non-trivial. Perhaps > > we can look into this after the first version of replication is done? > > > > Thanks, > > > > Jun > > > > On Mon, Oct 31, 2011 at 1:23 AM, Erik van Oosten <[EMAIL PROTECTED] > >wrote: > > > >>> This is interesting. But wouldn't the cost of the I/O here (writing to > >> log, > >>> requests to slave nodes) completely dominate the cost of locks? > >> > >> > >> That is not the point (mostly). While you're waiting for a lock, you > can't > >> issue another IO request. Avoiding locking is worthwhile even if CPU is > the > >> bottleneck. The advantage is that you'll get lower latency and also > >> important, less jitter. > >> > >> As you know, given the right hardware, sequential writes to disk are > >> already very fast. If you jump through some hoops (e.g. avoid TCP, use > user > >> space IP stack) the same applies to the network. In a carefully coded > async > >> system, I am not convinced up front it would dominate locking overhead. > >> > >> In fact, what the LMAX guys found out is that the synchronization > overhead > >> of e.g. an ArrayBlockingQueue (the fastest queue they could find) > >> completely dwarves any other CPU processing you might want to do. In > their > >> setup they process 6M messages per second on (by now) old hardware. > That is > >> including journalling, replicating to standby node, doing some financial > >> transaction stuff and then sending a reply in lock step with the standby > >> node. > >> > >> Kind regards, > >> Erik. > >> > >> -- > >> Erik van Oosten > >> http://day-to-day-stuff.blogspot.com > >> > >> On 30 okt. 2011, at 22:02, Jay Kreps wrote: > >> > >>> This is interesting. But wouldn't the cost of the I/O here (writing to > >> log, > >>> requests to slave nodes) completely dominate the cost of locks? > >>> > >>> -Jay > >>> > >>> On Sun, Oct 30, 2011 at 1:01 PM, Erik van Oosten <[EMAIL PROTECTED] > >>> wrote: > >>> > >>>> Hello, > >>>> > >>>> The upcoming replication support (which we eagerly anticipate at my > >> work) > >>>> is a feature for which LMAX' disruptor is an ideal solution ( > >>>> http://code.google.com/p/**disruptor/< > >> http://code.google.com/p/disruptor/>, > >>>> Apache licensed). A colleague has in fact just started on a new > >> replicating > >>>> message broker based on it (https://github.com/cdegroot/**underground > < > >> https://github.com/cdegroot/underground> > >>>> ). > >>>> > >>>> The disruptor itself is a super-performing in-jvm consumer/producer > >>>> system. A consumer normally works in its own thread. The disruptor > gets > >>>> most of its speed because it is designed such that each consumer can > >>>> continue working without releasing the CPU to the OS or other threads. > >> In > >>>> addition it is optimized for modern CPU architectures, for example by
-
Re: KAFKA-50 replication support and the DisruptorErik van Oosten 2011-11-01, 18:34
There are several wait strategies. You will want to use spin lock in
production environments where you should have enough CPU cores anyway. Remember, the 'real' work runs in another always running thread that also uses a spin lock to wait for more work. In dev environment or hosts that need to do lots of other stuff, you definitely need another wait strategy. Erik. Op 31-10-11 21:38, Chris Burroughs wrote: > On 10/31/2011 04:23 AM, Erik van Oosten wrote: >> That is not the point (mostly). While you're waiting for a lock, you can't issue another IO request. Avoiding locking is worthwhile even if CPU is the bottleneck. The advantage is that you'll get lower latency and also important, less jitter. > /begin{Tangent} > > Doesn't the Disruptor use a spin lock though? I would expect that to > not play nice if sharing a core with CPU bound threads doing 'real' work. -- Erik van Oosten http://www.day-to-day-stuff.blogspot.com/ |