One option would be to use Cassandra for the multi data center replication
and have Kafka Consumes update the Cassandra ring in each data center.

This allows you to have active / active / active applications.  It also
allows you to choose a ring and map/reduce the data out of it (like with
Pig) which will saturate that ring so you want to-do it in an availability
zone where your application is not serving out of.

Every data center has its own "commit log" thanks to the Kafka broker
cluster and data center replication from a database with a Cassandra ring.

This is the setup I have used before and continue to work on now.

I have been meaning to start open sourcing some of this (it uses Thrift
On Thu, Oct 31, 2013 at 8:37 AM, Muhzin <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB