Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - RE: storing last processed offset, recovery of failed message processing etc.


Copy link to this message
-
Re: storing last processed offset, recovery of failed message processing etc.
Benjamin Black 2013-12-09, 20:46
You might look at Curator http://curator.apache.org/
On Mon, Dec 9, 2013 at 12:36 PM, S Ahmed <[EMAIL PROTECTED]> wrote:

> Say am I doing this, a scenerio that I just came up with that demonstrates
> #2.
>
> Someone signs up on a website, and you have to:
>
> 1. create the user profile
> 2. send email confirmation email
> 3. resize avatar
>
>
> Now once a person registers on a website, I write a message to Kafka.
>
> Now I have 3 different things to process (1,2,3), if I get to #2 and then
> the server loses power, if I replay, I will re-send the confirmation email
> 2 times.   Sure in this case its not that big of a deal, but just pretend
> it is, what should be done?
>
> I guess I have to keep track of state then per step in ZK right? I mean
> that's the only way so I guess I am answering my own question but was
> hoping for people with real-life experience to chime in.
>
> I could write 3 messages to kafka, but maybe order is important :)
>
>
> On Mon, Dec 9, 2013 at 3:31 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
>
> > We use Zookeeper, as is standard with Kafka.
> >
> > Our systems are idempotent, so we only store offsets when the message is
> > fully processed. If this means we occasionally replay a message due to
> some
> > corner-case, or simply a restart, it doesn't matter.
> >
> > Philip
> >
> >
> > On Mon, Dec 9, 2013 at 12:28 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >
> > > I was hoping people could comment on how they handle the following
> > > scenerios:
> > >
> > > 1. Storing the last successfully processed messageId/Offset.  Are
> people
> > > using mysql, redis, etc.?  What are the tradeoffs here?
> > >
> > > 2. How do you handle recovering from an error while processesing a
> given
> > > event?
> > >
> > > There are various scenerioes for #2, like:
> > > 1. Do you mark the start of processing a message somewhere, and then
> > update
> > > the status to complete and THEN update the last messaged processed for
> > #1?
> > > 2. Do you only mark the status as complete, and not the start of
> > processing
> > > it?  I guess this depends of there are intermediate steps and
> processing
> > > the entire message again would result in some duplicated work right?
> > >
> >
>