I need suggestions on best methods of copying alot of data (~6Tb) from a cluster (20-dn) to the local file system.
While distcp has much more throughput compared to copytolocal (I think) because it uses MR jobs, it doesn't seem to work well with the following syntax <desturl> = "file://fs4/outdir/"
Problem: It puts in the home dir for the linux user. To get this to work I need to redefine the users home dir to the output dir (lun) with lotsa disk space.?
copytolocal is straightforward to use, but lacks the throughput (I think).