Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> RE: storing last processed offset, recovery of failed message processing etc.


Copy link to this message
-
Re: storing last processed offset, recovery of failed message processing etc.
You might look at Curator http://curator.apache.org/
On Mon, Dec 9, 2013 at 12:36 PM, S Ahmed <[EMAIL PROTECTED]> wrote:

> Say am I doing this, a scenerio that I just came up with that demonstrates
> #2.
>
> Someone signs up on a website, and you have to:
>
> 1. create the user profile
> 2. send email confirmation email
> 3. resize avatar
>
>
> Now once a person registers on a website, I write a message to Kafka.
>
> Now I have 3 different things to process (1,2,3), if I get to #2 and then
> the server loses power, if I replay, I will re-send the confirmation email
> 2 times.   Sure in this case its not that big of a deal, but just pretend
> it is, what should be done?
>
> I guess I have to keep track of state then per step in ZK right? I mean
> that's the only way so I guess I am answering my own question but was
> hoping for people with real-life experience to chime in.
>
> I could write 3 messages to kafka, but maybe order is important :)
>
>
> On Mon, Dec 9, 2013 at 3:31 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
>
> > We use Zookeeper, as is standard with Kafka.
> >
> > Our systems are idempotent, so we only store offsets when the message is
> > fully processed. If this means we occasionally replay a message due to
> some
> > corner-case, or simply a restart, it doesn't matter.
> >
> > Philip
> >
> >
> > On Mon, Dec 9, 2013 at 12:28 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >
> > > I was hoping people could comment on how they handle the following
> > > scenerios:
> > >
> > > 1. Storing the last successfully processed messageId/Offset.  Are
> people
> > > using mysql, redis, etc.?  What are the tradeoffs here?
> > >
> > > 2. How do you handle recovering from an error while processesing a
> given
> > > event?
> > >
> > > There are various scenerioes for #2, like:
> > > 1. Do you mark the start of processing a message somewhere, and then
> > update
> > > the status to complete and THEN update the last messaged processed for
> > #1?
> > > 2. Do you only mark the status as complete, and not the start of
> > processing
> > > it?  I guess this depends of there are intermediate steps and
> processing
> > > the entire message again would result in some duplicated work right?
> > >
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB