Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Transactional writing


Copy link to this message
-
Re: Transactional writing
For this particular use case, you can potentially include the message
offset from broker A in the message sent to broker B. If the transaction
fails, you read the last message from broker B and use the included offset
to resume the consumer from broker A. This assumes that there is only a
single client writing to broker B (for a particular partition).

Thanks,

Jun

On Fri, Oct 26, 2012 at 11:31 AM, Guozhang Wang <[EMAIL PROTECTED]> wrote:

> I am also quite interested in this thread, and I have another question
> here to ask about committing consumed messages. For example, if I need a
> program which acts both as a consumer and a producer, and the actions are
> wrapped in a "transaction":
>
> Transaction start:
>
>    Get next message from broker A;
>
>    Do something;
>
>    Send a message to broker B;
>
> Commit.
>
>
> If the transaction aborts after reading the message from broker A, is it
> possible to logically "put the message back" to brokers? I remember that
> Amazon Queue Service use some sort of lease mechanism, which might work for
> this case. But I am afraid that will affect the throughput a lot..
>
> --Guozhang
>
>
> -----Original Message-----
> From: Jay Kreps [mailto:[EMAIL PROTECTED]]
> Sent: Friday, October 26, 2012 2:08 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Transactional writing
>
> This is an important feature and I am interested in helping out in the
> design and implementation, though I am working on 0.8 features for the next
> month so I may not be of too much use. I have thought a little bit about
> this, but I am not yet sure of the best approach.
>
> Here is a specific use case I think is important to address: consider a
> case where you are doing processing of one or more streams and producing an
> output stream. This processing may involve some kind of local state (say
> counters or other local aggregation intermediate state). This is a common
> scenario. The problem is to give reasonable semantics to this computation
> in the presence of failures. The processor effectively has a
> position/offset in each of its input streams as well as whatever local
> state. The problem is that if this process fails it needs to restore to a
> state that matches the last produced messages. There are several solutions
> to this problem. One is to make the output somehow idempotent, this will
> solve some cases but is not a general solution as many things cannot be
> made idempotent easily.
>
> I think the two proposals you give outline a couple of basic approaches:
> 1. Store the messages on the server somewhere but don't add them to the log
> until the commit call
> 2. Store the messages in the log but don't make them available to the
> consumer until the commit call
> Another option you didn't mention:
>
> I can give several subtleties to these approaches.
>
> One advantage of the second approach is that messages are in the log and
> can be available for reading or not. This makes it possible to support a
> kind of "dirty read" that allows the consumer to specify whether they want
> to immediately see all messages with low latency but potentially see
> uncommitted messages or only see committed messages.
>
> The problem with the second approach at least in the way you describe it is
> that you have to lock the log until the commit occurs otherwise you can't
> roll back (because otherwise someone else may have appended their own
> messages and you can't truncate the log). This would have all the problems
> of remote locks. I think this might be a deal-breaker.
>
> Another variation on the second approach would be the following: have each
> producer maintain an id and generation number. Keep a schedule of valid
> offset/id/generation numbers on the broker and only hand these out. This
> solution would support non-blocking multi-writer appends but requires more
> participation from the producer (i.e. getting a generation number and id).
>
> Cheers,
>
> -Jay
>
> On Thu, Oct 25, 2012 at 7:04 PM, Tom Brown <[EMAIL PROTECTED]> wrote: