We are working on Kafka based event collection system. This needs to gather events from across data centers. Lets say all the events will be produced in DC1 while kafka brokers and consumers are lying in DC2. Round trip between DC1 and DC2 can be around ~80 ms. Number of events should be around ~50 million a day, peak being ~5K events a day, data volume ~100GB a day, peak being ~10MB a day. What is the best way to do it.
and DC2. replication by keeping brokers in both DC1 and DC2.
Kafka replication in 0.8 is designed for a Kafka cluster within the same DC. The following wiki describes cross DC mirroring using the tool MirrorMaker and how to optimize the throughput for long network latency.
So we'll have to maintain Zookeepers and Brokers in both the DCs while Producers can be in DC1 and Consumers can be in target DC2.
Are there any issues if we keep only Producer in DC1 talking to Zookeepers and Brokers in DC2. I've been able to achieve this by making a "hostname" entry in Broker properties which will have internal IP in DC2 and public IP in DC1.
On Mon, Feb 4, 2013 at 10:55 AM, Jun Rao <[EMAIL PROTECTED]> wrote: Thanks & Regards, Apoorva
Thanks Jun. Is there any parameter exposed through .properties files? I can see socket.send.buffer, socket.receive.buffer and max.socket.request.bytes in broker properties files but nothing in producers'.
On Mon, Feb 4, 2013 at 10:05 PM, Jun Rao <[EMAIL PROTECTED]> wrote: Thanks & Regards, Apoorva
There is buffer.size in producer config for setting producer socket send buffer size.
On Mon, Feb 4, 2013 at 9:29 AM, Apoorva Gaurav <[EMAIL PROTECTED]>wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext