-Re: copytolocal vs distcp
Ted Dunning 2013-03-09, 19:00
Symbolic links can also help.
Note that this file system has to be visible with the same path on all
hosts. You may also be bandwidth limited by whatever is serving that file
There are cases where you won't be limited by the file system. MapR, for
instance, has a completely distributed NFS server and specialized file
systems like lustre might also have distributed network traffic. If you are
just writing to a conventional NAS, however, this is unlikely to win much
relative to copytolocal simply due to bottlenecking.
On Sat, Mar 9, 2013 at 1:07 PM, John Meza <[EMAIL PROTECTED]> wrote:
> I need suggestions on best methods of copying alot of data (~6Tb) from a
> cluster (20-dn) to the local file system.
> While *distcp *has much more throughput compared to copytolocal (I think)
> because it uses MR jobs, it doesn't seem to work well with the following
> <desturl> = "file://fs4/outdir/"
> Problem: It puts in the home dir for the linux user. To get this to work I
> need to redefine the users home dir to the output dir (lun) with lotsa disk
> *copytolocal *is straightforward to use, but lacks the throughput (I
> Suggestions? Advice?