Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Transactional writing


Copy link to this message
-
Re: Transactional writing
If you use Kafka just as a redo log, you can't undo anything that's written
to the log. Write-ahead logs in typical database systems are both redo and
undo logs. Transaction commits and rollbacks are implemented on top of the
logs. However, general-purpose write-ahead logs for transactions are much
more complicated.

Thanks,

Jun

On Fri, Oct 26, 2012 at 11:08 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> This is an important feature and I am interested in helping out in the
> design and implementation, though I am working on 0.8 features for the next
> month so I may not be of too much use. I have thought a little bit about
> this, but I am not yet sure of the best approach.
>
> Here is a specific use case I think is important to address: consider a
> case where you are doing processing of one or more streams and producing an
> output stream. This processing may involve some kind of local state (say
> counters or other local aggregation intermediate state). This is a common
> scenario. The problem is to give reasonable semantics to this computation
> in the presence of failures. The processor effectively has a
> position/offset in each of its input streams as well as whatever local
> state. The problem is that if this process fails it needs to restore to a
> state that matches the last produced messages. There are several solutions
> to this problem. One is to make the output somehow idempotent, this will
> solve some cases but is not a general solution as many things cannot be
> made idempotent easily.
>
> I think the two proposals you give outline a couple of basic approaches:
> 1. Store the messages on the server somewhere but don't add them to the log
> until the commit call
> 2. Store the messages in the log but don't make them available to the
> consumer until the commit call
> Another option you didn't mention:
>
> I can give several subtleties to these approaches.
>
> One advantage of the second approach is that messages are in the log and
> can be available for reading or not. This makes it possible to support a
> kind of "dirty read" that allows the consumer to specify whether they want
> to immediately see all messages with low latency but potentially see
> uncommitted messages or only see committed messages.
>
> The problem with the second approach at least in the way you describe it is
> that you have to lock the log until the commit occurs otherwise you can't
> roll back (because otherwise someone else may have appended their own
> messages and you can't truncate the log). This would have all the problems
> of remote locks. I think this might be a deal-breaker.
>
> Another variation on the second approach would be the following: have each
> producer maintain an id and generation number. Keep a schedule of valid
> offset/id/generation numbers on the broker and only hand these out. This
> solution would support non-blocking multi-writer appends but requires more
> participation from the producer (i.e. getting a generation number and id).
>
> Cheers,
>
> -Jay
>
> On Thu, Oct 25, 2012 at 7:04 PM, Tom Brown <[EMAIL PROTECTED]> wrote:
>
> > I have come up with two different possibilities, both with different
> > trade-offs.
> >
> > The first would be to support "true" transactions by writing
> > transactional data into a temporary file and then copy it directly to
> > the end of the partition when the commit command is created. The
> > upside to this approach is that individual transactions can be larger
> > than a single batch, and more than one producer could conduct
> > transactions at once. The downside is the extra IO involved in writing
> > it and reading it from disk an extra time.
> >
> > The second would be to allow any number of messages to be appended to
> > a topic, but not move the "end of topic" offset until the commit was
> > received. If a rollback was received, or the producer timed out, the
> > partition could be truncated at the most recently recognized "end of
> > topic" offset. The upside is that there is very little extra IO (only