Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> copytolocal vs distcp


Copy link to this message
-
copytolocal vs distcp
I need suggestions on best methods of copying  alot of data (~6Tb) from a cluster (20-dn) to the local file system.
While distcp has much more throughput compared to copytolocal (I think) because it uses MR jobs,  it doesn't seem to work well with the following syntax   <desturl> =   "file://fs4/outdir/"
Problem: It puts in the home dir for the linux user. To get this to work I need to redefine the users home dir to the output dir (lun) with lotsa disk space.?
copytolocal is straightforward to use, but lacks the throughput (I think).
Suggestions? Advice?thanksJohn    
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB