Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Would this work as a general solution for transactions in Kafka?

Copy link to this message
Re: Would this work as a general solution for transactions in Kafka?
I don't think all messages need to be sequential. You just need to omit
messages from failed transactions in serving fetch requests, and this
requires storage proportional to the number of failed transactions. The
assumption is that failed transactions are very rare (i.e. due to machine
failures) so this should be small.

WRT client versus server the assumption is that all control messages are
useful to some consumer so reading all of them on the server side should
not be a limitation.

There are a number of things not worked out here so I wouldn't take it to
seriously I just wanted to throw out the thought experiment because to
really be useful I do think it is necessary to allow multiple producers and
move any complex logic to the server side.

On Fri, Nov 16, 2012 at 8:46 AM, Tom Brown <[EMAIL PROTECTED]> wrote:

> Jay,
> I'm not sure how you're going to get around the issue of a single
> producer per partition. For efficient reads, all of the messages from
> a single transaction have to be sequential, and that only happens if
> either a) the messages are all written atomically (perhaps from
> memory, or temporary storage, etc), or b) all messages come from a
> single producer.
> If you use a single (internal) control partition for all topics the
> server would need to read and ignore irrelevant transaction records
> from topics the consumer isn't interested in. Also, you would not be
> able to effectively delete a single partition (though that may only be
> valuable for developers). That said, the simplicity of a single
> control partition may outweigh those problems.
> --Tom
> On Thu, Nov 15, 2012 at 6:24 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> > Hey Tom,
> >
> > Yes, this is very similar to what I had in mind.
> >
> > The primary difference is that I want to implement the control on the
> > server-side. That is, rather than having the consumer be smart and use
> the
> > control topic directly it would be preferable to have the server handle
> > this. This way it would be easy to carry this logic across consumers in a
> > variety of languages. The implementation would be that we add a new
> > parameter to the fetch request read_committed={true, false}. If this
> > parameter is set to true then we would not hand out messages until we had
> > the commit message for the requested offset. The other advantage of doing
> > this on the server side is that I think we could then have only a single
> > control/commit topic rather than one per data topic.
> >
> > I think there might also be an alternative to requiring exclusivity on
> the
> > producer side--indeed requiring this makes the feature a lot less useful.
> > This requires waiting until all offsets in a given range are committed
> > before it can be handed out, though this is more complex. The details of
> my
> > proposal involved a unique producer id per producer and a generation id
> > that increased on every "rollback".  A commit with a higher generation id
> > for an existing producer id would implicitly roll back everything that
> > producer sent since the last commit.
> >
> > -Jay
> >
> >
> > On Wed, Nov 14, 2012 at 12:12 PM, Tom Brown <[EMAIL PROTECTED]>
> wrote:
> >
> >> Just thought of a way to do transactions in Kafka. I think this
> >> solution would cover the most common types of transactions. However,
> >> it's often useful to run an idea by a second set of eyes. I am
> >> interested in knowing where the holes are in this design that I
> >> haven't been able to see. If you're interested in transactional kafka,
> >> please review this and let me know any feedback you have.
> >>
> >> A transactional topic can be approximated by using a second topic as a
> >> control stream. Each message in the control topic would contain the
> >> offset and length (and an optional transaction ID). There is no change
> >> to the messages written to the data topic. The performance impact
> >> would generally be low-- the larger the transaction size, the less the