Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Re: Any update on the "distributed commit" problem?


Copy link to this message
-
Re: Any update on the "distributed commit" problem?
Neha Narkhede 2013-03-25, 19:34
Today, the only safe way of controlling consumer state management is
by using the SimpleConsumer. The application is responsible for
checkpointing offsets. So, in your example, when you commit a database
transaction, you can store your consumer's offset as part of the txn.
So either your txn succeeds and the offset moves ahead or your txn
fails and the offset stays where it is.

Kafka 0.9 is when we will attempt to merge the high level and low
level consumer APIs, move the offset management to the broker and
offer stronger offset checkpointing guarantees.

Thanks,
Neha

On Mon, Mar 25, 2013 at 11:36 AM, Darren Sargent
<[EMAIL PROTECTED]> wrote:
> This is where you are reading messages from a broker, doing something with the messages, then commit them to some permanent storage such as HBase. There is a race condition in commiting the offsets to Zookeeper; if the DB write succeeds, but the ZK commit fails for any reason, you'll get a duplicate batch next time you query the broker. If you commit to ZK first, and the commit to the DB then fails, you lose data.
>
> The Kafka white paper mentions that Kafka stays agnostic about the distributed commit problem. There has been some prior discussion about this but I haven't seen any solid solutions. If you're using something like PostgreSQL that admits two-phase commits, you can roll the offset into the DB transaction, assuming you're okay with storing offsets in the DB rather than in ZK, but that's not a general solution.
>
> Is there anything in Kafka 0.8.x that helps address this issue?
>
> --Darren Sargent
> RichRelevance (www.richrelevance.com)