Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Replication questions


Copy link to this message
-
Re: Replication questions
Felix GV 2012-04-26, 21:03
Thanks Jun :)

--
Felix

On Thu, Apr 26, 2012 at 3:26 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Some comments inlined below.
>
> Thanks,
>
> Jun
>
> On Thu, Apr 26, 2012 at 10:27 AM, Felix GV <[EMAIL PROTECTED]> wrote:
>
> > Cool :) Thanks for those insights :) !
> >
> > I changed the subject of the thread, in order not to derail the original
> > thread's subject...! I just want to recap to make sure I (and others)
> > understand all of this correctly :)
> >
> > So, if I understand correctly, with acks == [0,1] Kafka should provide a
> > latency that is similar to what we have now, but with the possibility of
> > losing a small window of unreplicated events in the case of an
> > unrecoverable hardware failure, and with acks > 1 (or acks == -1) there
> > will probably be a latency penalty but we will be completely protected
> from
> > (non-correlated) hardware failures, right?
> >
> > This is mostly true. The difference is that in 0.7, producer doesn't wait
> for a TCP response from broker. In 0.8, the producer always waits for a
> response from broker. How quickly the broker sends the response depends on
> acks. If acks is less than ideal, you may get the response faster, but have
> some risk of losing the data if there is broker failure.
>
>
> > Also, I guess the above assumptions are correct for a batch size of 1,
> and
> > that bigger batch sizes could also lead to small windows of unwritten
> data
> > in cases of failures, just like now...? Although, now that I think of
> it, I
> > guess the vulnerability of bigger batch sizes would, again, only come
> into
> > play in scenarios of unrecoverable correlated failures, since even if a
> > machine fails with some partially committed batch, there would be other
> > machines who received the same data (through replication) and would have
> > enough time to commit those batches...
> >
> > I want to add that if the producer itself dies, it could lose a batch of
> events.
>
>
> > Finally, I guess that replication (whatever the ack parameter is) will
> > affect the overall throughput capacity of the Kafka cluster, since every
> > node will now be writing its own data as well as the replicated data from
> > +/- 2 other nodes, right?
> >
> > --
> > Felix
> >
> >
> >
> > On Wed, Apr 25, 2012 at 6:32 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > Short answer is yes, both async (acks=0 or 1) and sync replication
> > > (acks > 1) will be both be supported.
> > >
> > > -Jay
> > >
> > > On Wed, Apr 25, 2012 at 11:22 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > > > Felix,
> > > >
> > > > Initially, we thought we could keep the option of not sending acks
> from
> > > the
> > > > broker to the producer. However, this seems hard since in the new
> wire
> > > > protocol, we need to send at least the error code to the producer
> > (e.g.,
> > > a
> > > > request is sent to the wrong broker or wrong partition).
> > > >
> > > > So, what we allow in the current design is the following. The
> producer
> > > can
> > > > specify the # of acks in the request. By default (acks = -1), the
> > broker
> > > > will wait for the message to be written to all replicas that are
> still
> > > > synced up with the leader before acking the producer. Otherwise (acks
> > > >=0),
> > > > the broker will ack the producer after the message is written to acks
> > > > replicas. Currently, acks=0 is treated the same as acks=1.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Apr 25, 2012 at 10:39 AM, Felix GV <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > >> Just curious, but if I remember correctly from the time I read
> > KAFKA-50
> > > and
> > > >> the related JIRA issues, you guys plan to implement sync AND async
> > > >> replication, right?
> > > >>
> > > >> --
> > > >> Felix
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Apr 24, 2012 at 4:42 PM, Jay Kreps <[EMAIL PROTECTED]>
> > wrote:
> > > >>
> > > >> > Right now we do sloppy failover. That is when a broker goes down
> > > >> > traffic is redirected to the remaining machines, but any