Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Combating network latency best practice


Copy link to this message
-
Re: Combating network latency best practice
Calvin Lei 2013-07-11, 04:17
Thanks Jay. We will still suffer from network latency if we use remote
write.
We probably will explore more on the idea of having local cluster and
mirror messages across the DC.
thanks,
Cal
On Wed, Jul 10, 2013 at 12:04 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> To publish to a remote data center just configure the producers with the
> host/port of the remote datacenter. To ensure good throughput you may want
> to tune the socket send and receive buffers on the client and server to
> avoid small roundtrips:
> http://en.wikipedia.org/wiki/Bandwidth-delay_product
>
> -Jay
>
>
>
> On Wed, Jul 10, 2013 at 6:57 AM, Calvin Lei <[EMAIL PROTECTED]> wrote:
>
> > Thanks Jay. I thought of using the worldview architecture you suggested.
> > But since our consumers are also globally deployed, which means any new
> > messages arrive the worldview needs to be replicated back to the local
> DCs,
> > making the topology a bit complicated.
> >
> > Would you please elaborate on the remote write? How do I achieve it?
> > On Jul 10, 2013 1:08 AM, "Jay Kreps" <[EMAIL PROTECTED]> wrote:
> >
> > > Ah, good question we really should add this to the documentation.
> > >
> > > We run a cluster per data center. All writes always go to the
> data-center
> > > local cluster. Replication to aggregate clusters that provide the
> "world
> > > wide" view is done with mirror maker.
> > >
> > > It is also fine to write to or read from a kafka cluster in a remote
> > colo,
> > > though obviously you have to think about the case where the cluster is
> > not
> > > accessible due to network access.
> > >
> > > Kafka is not designed to run a single cluster spread across
> > geographically
> > > disparate colos and you would see a few problems in that scenario. The
> > > first is that, as you noted, the latency will be terrible as it will
> > block
> > > on the slowest response from all datacenters. This could be avoided if
> > you
> > > lowered the request.required.acks to 1, but that would impact
> durability
> > > guarantees. The second problem is that Kafka will not remain available
> in
> > > the presence of network partitions so if the inter-datacenter link
> failed
> > > one datacenter would lose its cluster. Finally we have not done
> anything
> > to
> > > attempt to optimize partition placement by colo so you would not
> actually
> > > have redundancy between colos because we would often place all replicas
> > in
> > > a single colo.
> > >
> > > -Jay
> > >
> > >
> > > On Tue, Jul 9, 2013 at 9:34 PM, Calvin Lei <[EMAIL PROTECTED]> wrote:
> > >
> > > > Folks,
> > > >    Our application has multiple producers globally (region1, region2,
> > > > region3). If we group all the brokers together into one cluster, we
> > > notice
> > > > an obvious network latency if a broker replicates regionally with the
> > > > request.required.acks = -1.
> > > >
> > > >    Is there any best practice for combating the network latency in
> the
> > > > deployment topology? Should we segregate the brokers regionally (one
> > > kafka
> > > > cluster per region) and set up MirrorMaker between the regions
> (region1
> > > > <--> region2, region2 <--> region3, region1 <--> region3), total of 6
> > > > mirror makes?
> > > >
> > > >
> > > > Thanks.
> > > >
> > >
> >
>