Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Using distcp with Hadoop HA


Copy link to this message
-
Using distcp with Hadoop HA
Hello everyone.. I am trying to use distcp with Hadoop HA configuration (using CDH4.0.0 at the moment).. Here is my problem:
- I am trying to do a distcp from cluster A to cluster B. Since no operations are supported on the standby namenode, I need to specify either the active namenode while using distcp or use the failover proxy provider (dfs.client.failover.proxy.provider.clusterA) where I can specify the two namenodes for cluster B and the failover code inside HDFS will figure it out.. 
- If I use the failover proxy provider, some of my datanodes on cluster A would connect to the namenode on cluster B and vice versa. I am assuming that is because I have configured both nameservices in my hdfs-site.xml for distcp to work.. I have configured dfs.nameservice.id to be the right one but the datanodes do not seem to respect that. 

What is the best way to use distcp with Hadoop HA configuration without having the datanodes to connect to the remote namenode? Thanks
 
Regards,
Dhaval
+
Dhaval Shah 2013-01-29, 23:23
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB