Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> copytolocal vs distcp

Copy link to this message
copytolocal vs distcp
I need suggestions on best methods of copying  alot of data (~6Tb) from a cluster (20-dn) to the local file system.
While distcp has much more throughput compared to copytolocal (I think) because it uses MR jobs,  it doesn't seem to work well with the following syntax   <desturl> =   "file://fs4/outdir/"
Problem: It puts in the home dir for the linux user. To get this to work I need to redefine the users home dir to the output dir (lun) with lotsa disk space.?
copytolocal is straightforward to use, but lacks the throughput (I think).
Suggestions? Advice?thanksJohn