On 11/30/2010 03:51 AM, Steve Loughran wrote:
> On 30/11/10 03:59, hadoopman wrote:
> you don't need all the files in the cluster in sync as a lot of them
> are intermediate and transient files.
> Instead use dfscopy to copy source files to the two clusters, this
> runs across the machines in the cluster and is also designed to work
> across hadoop versions, with some limitations.
Page 70 in the Oreilly Hadoop book talks about distcp to copy data
across two hdfs clusters. I'm curious if something like that would also
work? Would I just be able to call both namenode1 from both clusters
when initiating the copy? Still playing with it. Figured I should ask :-)