In our case, our aggregator/analytics cluster is in our main datacenter, so there's no risk of the main producers becoming disconnected from it. It seems nicer to have a dedicated aggregator cluster, that only only gets its data via MirrorMaker (Option A), but in our case this isn't necessary.
The aggregator cluster could use MirrorMaker to consume from remote datacenters, but still have regular local producers send it data directly (Option B).
On Aug 20, 2013, at 1:47 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> We do something like A (though I'm not sure I understand B):
> Essentially what we wanted was that each datacenter stood alone so that we
> would not lose data if the datacenters became disconnected. Network
> partitions within our data centers are extremely rare but between
> datacenters relatively common.
> On Tue, Aug 20, 2013 at 10:35 AM, Andrew Otto <[EMAIL PROTECTED]> wrote:
>> Hi all!
>> Wikimedia is investigating how best to set up Broker clusters in multiple
>> data centers. Our main analytics Broker cluster is currently in our main
>> datacenter. It is possible for all of the main DC's frontend producers to
>> produce directly to our analytics cluster, but we're not sure if this is a
>> best practice. So! What does LinkedIn recommend?
>> Option A: N + 1 clusters.
>> - N production Broker Clusters (1 for each DC).
>> - +1 aggregator/analytics Broker cluster that mirrors all of the
>> production clusters.
>> - Option B: N total Broker clusters.
>> - Frontend producers in the main cluster produce directly to the
>> aggregator/analytics cluster.
>> - Other DC's clusters are mirrored to the aggregator/analytics cluster.