Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Mirroring datacenters without vpn


Copy link to this message
-
Re: Mirroring datacenters without vpn
>
> Ops proposed to set up mirror to work over open internet channel without
> secured vpn. Security of this particular data is not a concern and, as I
> understood, it will give us more bandwidth (unless we buy some extra
> hardware, lot's of internal details there).
>
> Is this configuration possible at all? Have anyone tried/using such
> configuration? I'd appreciate any feedback.
>
> Major source of confusion is how MirrorMaker/other producers would handle
> external names for the brokers. As I understand, producer connects to the
> broker in the configuration only to bootstrap (get list of all available
> brokers), and after that talks to the brokers received during
> bootstrapping. So local clients won't work (or will route to external
> interface) if I configure brokers to use external names. Remote clients
> won't work if internal names configured.
> Is there some reasonable way to configure kafka to support such scenario?

Would this feature help in your case:
https://issues.apache.org/jira/browse/KAFKA-1092
i.e., you can configure the broker to publish a separate hostname to
zookeeper which is what the producers should use when actually sending
data. So you would need to override the advertised.host.name and port
properties.

>
> Also, should I run MirrorMaker in the same DC as central kafka cluster or
> multiple MirrorMakers in remote DCs?
>
> Any description of how it is setup in your case is helpful. Do you use vpn
> between DCs? Where do you run MirrorMaker - in central dc or in remote and
> why?

We generally run the mirror-maker in the target data center. i.e., we
do a remote consume but local produce. If you have a flaky connection
between the two clusters the consumers may encounter hit session
expirations and rebalance and reduce the overall throughput. You can
also do local consumption and remote produce although we have not
tried that. In either case you will need to set a high socket buffer
to help amortize the high network latencies.

Thanks,

Joel
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB