Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: copytolocal vs distcp


Copy link to this message
-
Re: copytolocal vs distcp
Try file:///fs4/outdir

Symbolic links can also help.

Note that this file system has to be visible with the same path on all
hosts.  You may also be bandwidth limited by whatever is serving that file
system.

There are cases where you won't be limited by the file system.  MapR, for
instance, has a completely distributed NFS server and specialized file
systems like lustre might also have distributed network traffic. If you are
just writing to a conventional NAS, however, this is unlikely to win much
relative to copytolocal simply due to bottlenecking.
On Sat, Mar 9, 2013 at 1:07 PM, John Meza <[EMAIL PROTECTED]> wrote:

> I need suggestions on best methods of copying  alot of data (~6Tb) from a
> cluster (20-dn) to the local file system.
>
> While *distcp *has much more throughput compared to copytolocal (I think)
> because it uses MR jobs,  it doesn't seem to work well with the following
> syntax
>    <desturl> =   "file://fs4/outdir/"
>
> Problem: It puts in the home dir for the linux user. To get this to work I
> need to redefine the users home dir to the output dir (lun) with lotsa disk
> space.?
>
> *copytolocal *is straightforward to use, but lacks the throughput (I
> think).
>
> Suggestions? Advice?
> thanks
> John
>
+
John Meza 2013-03-09, 19:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB