Andrew Otto 2013-08-20, 17:35
Jay Kreps 2013-08-20, 17:47
Andrew Otto 2013-08-20, 17:57
Maxime Brugidou 2013-08-22, 06:39
Great, thanks for the answers all!
What about just for redundancy/maintenance purposes? I know that with replication, individual brokers are redundant anyway, but might it ever be nice to be able to take the analytics kafka cluster offline completely without worrying about losing data from the frontend producers?
On Aug 22, 2013, at 2:38 AM, Maxime Brugidou <[EMAIL PROTECTED]> wrote:
> We sort of have the same situation where our analytics DC is one of the
> main producer DC too. If you use Kafka only for analytics it is fine to
> produce directly to the analytics cluster from that DC and mirror the rest.
> However we also want to be able to run things locally that will consume
> local data from the local clusters for near real-time applications. This
> can't be done in the central DC in this situation since all data will be
> aggregated. The N+1 solution is more flexible if you need that.
> On Aug 20, 2013 7:57 PM, "Andrew Otto" <[EMAIL PROTECTED]> wrote:
>> In our case, our aggregator/analytics cluster is in our main datacenter,
>> so there's no risk of the main producers becoming disconnected from it. It
>> seems nicer to have a dedicated aggregator cluster, that only only gets its
>> data via MirrorMaker (Option A), but in our case this isn't necessary.
>> The aggregator cluster could use MirrorMaker to consume from remote
>> datacenters, but still have regular local producers send it data directly
>> (Option B).
>> On Aug 20, 2013, at 1:47 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>> We do something like A (though I'm not sure I understand B):
>>> Essentially what we wanted was that each datacenter stood alone so that
>>> would not lose data if the datacenters became disconnected. Network
>>> partitions within our data centers are extremely rare but between
>>> datacenters relatively common.
>>> On Tue, Aug 20, 2013 at 10:35 AM, Andrew Otto <[EMAIL PROTECTED]>
>>>> Hi all!
>>>> Wikimedia is investigating how best to set up Broker clusters in
>>>> data centers. Our main analytics Broker cluster is currently in our
>>>> datacenter. It is possible for all of the main DC's frontend producers
>>>> produce directly to our analytics cluster, but we're not sure if this
>> is a
>>>> best practice. So! What does LinkedIn recommend?
>>>> Option A: N + 1 clusters.
>>>> - N production Broker Clusters (1 for each DC).
>>>> - +1 aggregator/analytics Broker cluster that mirrors all of the
>>>> production clusters.
>>>> - Option B: N total Broker clusters.
>>>> - Frontend producers in the main cluster produce directly to the
>>>> aggregator/analytics cluster.
>>>> - Other DC's clusters are mirrored to the aggregator/analytics cluster.