Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: copytolocal vs distcp

Copy link to this message
Re: copytolocal vs distcp
Try file:///fs4/outdir

Symbolic links can also help.

Note that this file system has to be visible with the same path on all
hosts.  You may also be bandwidth limited by whatever is serving that file

There are cases where you won't be limited by the file system.  MapR, for
instance, has a completely distributed NFS server and specialized file
systems like lustre might also have distributed network traffic. If you are
just writing to a conventional NAS, however, this is unlikely to win much
relative to copytolocal simply due to bottlenecking.
On Sat, Mar 9, 2013 at 1:07 PM, John Meza <[EMAIL PROTECTED]> wrote:

> I need suggestions on best methods of copying  alot of data (~6Tb) from a
> cluster (20-dn) to the local file system.
> While *distcp *has much more throughput compared to copytolocal (I think)
> because it uses MR jobs,  it doesn't seem to work well with the following
> syntax
>    <desturl> =   "file://fs4/outdir/"
> Problem: It puts in the home dir for the linux user. To get this to work I
> need to redefine the users home dir to the output dir (lun) with lotsa disk
> space.?
> *copytolocal *is straightforward to use, but lacks the throughput (I
> think).
> Suggestions? Advice?
> thanks
> John