Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - How to minimize the number of message loss when broker goes down.


Copy link to this message
-
Re: How to minimize the number of message loss when broker goes down.
Joe Stein 2012-08-24, 16:26
we reconcile our data using hadoop throughout the day to "heal" what we are
live streaming (especially for revenue generating data).

so our setup is basically (seriously high level)

server > kafka  > aggregate > persist insert (happens in seconds) && server
> log file > hadoop map/reduce > re-persist update/overwrite (happens every
20-180 minutes)

so we get best of both worlds and audit our trade offs => real time
analytics of our data processing and (within the time we SLA on for
reporting) have 100% accurate data.

one thing I have been meaning to-do is to keep track of the variance of
those diffs, to-be-dos
On Fri, Aug 24, 2012 at 12:05 PM, Taylor Gautier <[EMAIL PROTECTED]>wrote:

> Re-sending could lead to duplicated messages however - consider the case
> that the broker has committed the message and sent the ack but the ack is
> either sitting in the send buffer of the broker or recv buffer of the
> producer.
>
> Just saying that you will trade off a relatively small amount of loss for a
> probably smaller amount of duplication using only a simple ack.
>
>
> On Fri, Aug 24, 2012 at 6:49 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Xiaoyu,
> >
> > In 0.7, we have this problem that the producer doesn't receive any ack.
> So,
> > syncProducer.send is considered successful as soon as the messages are in
> > the socket buffer. If the broker goes down before the socket buffer is
> > flushed, those supposedly successful messages are lost. What's worse is
> > that the producer doesn't know this since it doesn't wait for a response.
> > Such lost messages should be small. However, I am not sure how to
> > reduce/prevent it. This issue will be addressed in 0.8, in which the
> > producer will receive an ack. If a broker goes down in the middle of a
> > send, the producer will get an exception and can resend.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Aug 23, 2012 at 5:19 PM, xiaoyu wang <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hello,
> > >
> > > We are using sync produce to push messages to kafka brokers. It will
> stop
> > > once it receives an IOException: connection reset by peer. It seems
> when
> > a
> > > broker goes down, we lose some messages. I have reduced
> > > "log.flush.interval" to 1, still see > 200 message loss.
> > >
> > > I also reduced the batch.size on producer side to 10, but the message
> > loss
> > > is about the same.
> > >
> > > So, what's the best way to minimize message loss on broker down?
> > >
> > > Thanks,
> > >
> > > -Xiaoyu
> > >
> >
>

--

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/