Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Transactional writing


Copy link to this message
-
Re: Transactional writing
Why wouldn't the storm approach provide semantics of exactly once
delivery? https://github.com/nathanmarz/storm

Nathan actually credits the Kafka_devs for the basic idea of transaction
persisting in one of his talks.

Regards
Milind

On Nov 3, 2012 11:51 AM, "Rohit Prasad" <[EMAIL PROTECTED]> wrote:

> I agree that this approach only prevents duplicate messages to partition
> from the Producer side. There needs to be a similar approach on the
> consumer side too. Using Zk can be one solution, or other non-ZK
> approaches.
>
> Even if Consumer reads none or all messages of a transaction. But that does
> not solve the transaction problem yet. Because the business/application
> logic inside the Consumer thread may execute partially and fail. So it
> becomes tricky to decide the point when you want to say that you have
> "consumed" the message and increase consumption offset. If your consumer
> thread is saving some value  into DB/HDFS/etc, ideally you want this save
> operation and consumption offset to be incremented atomically. Thats why it
> boils down to Application logic implementing transactions and dealing with
> duplicates.
> Maybe a journalling or redo log approach on Consumer side can help build
> such a system.
>
> It will be nice if eventually kafka can be a transport which provides
> "exactly once" semantics for message delivery. Then consumer threads can be
> sure that they receive messages once, and can build appln logic on top of
> that.
>
> I have a use case similar to what Jay mentioned in a previous mail. I want
> to do aggregation but want the aggregated data to be correct, possible
> avoiding duplicates incase of failures/crashes.
>
>
>
> On Fri, Nov 2, 2012 at 4:12 PM, Tom Brown <[EMAIL PROTECTED]> wrote:
>
> > That approach allows a producer to prevent duplicate messages to the
> > partition, but what about the consumer? In my case, I don't want the
> > consumer to be able to read any of the messages unless it can read all
> > of the messages from a transaction.
> >
> > I also like the idea of there being multiple types of Kafka
> > transaction, though, just to accommodate different performance,
> > reliability, and consumption patterns. Of course, the added complexity
> > of that might just sink the whole thing.
> >
> > --Tom
> >
> > On Fri, Nov 2, 2012 at 4:11 PM, Rohit Prasad <[EMAIL PROTECTED]>
> > wrote:
> > > Getting transactional support is quite hard problem. There will always
> be
> > > corner cases where the solution will not work, unless you want to go
> down
> > > the path of 2PC, paxos, etc which ofcourse will degrade kafka's
> > > performance. It is best to reconcile data and deal with duplicate
> > messages
> > > in Application layer. Having said that it would be amazing if we can
> > build
> > > "at most once" semantics in Kafka!!
> > >
> > > Regarding above approaches,
> > > The producer will always have a doubt if its commit went through. i.e.
> if
> > > the ack for "commit" is not received by the producer. Or If producer
> dies
> > > immediately after calling the commit. When it is restarted how does it
> > know
> > > if last operation went through?
> > >
> > > I suggest the following -
> > > 1. Producer should attach a timestamp at the beginning of each message
> > and
> > > send it to Server.
> > > 2. On restarts/timeouts/re-connections, the producer should first read
> > the
> > > last committed message from the leader of the partition.
> > > 3. From timestamp, it can know how many messages went through before it
> > > died (or connection was broken). And it can infer how many messages to
> > > replay.
> > >
> > > The above approach can be used with existing Kafka libraries since you
> > can
> > > have a producer and consumer thread together in an application to
> > implement
> > > this logic. Or someone can take the initiative to write a Transactional
> > > producer (which internally has both producer and a consumer to read
> last
> > > committed message.) I will be developing one for kafka 0.8 in c++.