Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> HDFS Rsync process??


Copy link to this message
-
HDFS Rsync process??
We have two Hadoop clusters in two separate buildings.  Both clusters
are loading the same data from the same sources (the second cluster is
for DR).

We're looking at how we can recover the primary cluster and catch it
back up again as new data will continue to feed into the DR cluster.  
It's been suggested we use rsync across the network however my concern
is the amount of data we would have to copy over would take several days
(at a minimum) to sync them even with our dual bonded 1 gig network cards.

I'm curious if anyone has come up with a solution short of just loading
the source logs into HDFS.  Is there a way to even rsync two clusters
and get them in sync?  Been googling around.  Haven't found anything of
substances yet.

Thanks!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB