Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Complex multi-datacenter setups


+
Maxime Petazzoni 2013-07-12, 00:19
Copy link to this message
-
Re: Complex multi-datacenter setups
We we have at LinkedIn is an extra aggregate cluster per data center. We
use mirror maker to copy data from the local cluster in each of the data
centers to the aggregate one.

Thanks,

Jun
On Thu, Jul 11, 2013 at 5:18 PM, Maxime Petazzoni <[EMAIL PROTECTED]
> wrote:

> Hi all,
>
> I was wondering if anybody here has and was willing to share experience
> about designing and operating complex multi-datacenter/multi-cluster
> Kafka deployments in which data must flow from and to several distinct
> Kafka clusters with more complex semantics than what MirrorMaker
> provides.
>
> The general, very sensible consensus is that producers of data should
> publish to a local Kafka cluster. But if that data is produced in
> multiple datacenters, and must be consumed multiple datacenters as well,
> then you need to implement data routing and filtering to organise your
> pipeline.
>
> Imagine the following scenario, with three datacenters A, B and C. Data
> is being produced (of the same kind, to the same topic) in all three
> datacenters. Both datacenters A and B have consumers that want all the
> data generated in all three datacenters, but C is only interested in a
> subset of what is produced in A and B (according to some specific
> filters for example).
>
> This means you have data flowing in both directions between each
> datacenter. You need some kind of source-base filtering to prevent data
> going back and forth ad vitam eternam, as well as something to implement
> the custom filtering logic where needed, which also means you'd need to
> envelope all data into a broader object that knows about where the data
> was published from.
>
> Is this kind of deployment pretty common in the industry/among the users
> of Kafka? I haven't found much online that would help putting together
> this type of architectures. Is it basically roll-your-own with something
> similar to the MirrorMaker that has a consumer, filtering component and
> producer, and place a couple of these in each direction between each
> pair of clusters?
>
> It ultimately bogs down to pretty simple "routing" of data, just in a
> more complex manner than having all data flow to a single sink location.
>
>
> Let me know what you folks think!
>
> TIA,
> /Max
> --
> Maxime Petazzoni
> Sr. Platform Engineer
> m 408.310.0595
> www.turn.com
>

 
+
Maxime Petazzoni 2013-07-12, 16:30
+
Jun Rao 2013-07-12, 16:37
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB