-RE: copytolocal vs distcp
John Meza 2013-03-09, 19:17
The file:///fs4/outdir solved the outfile location issue. Dhaval Shah made the same suggestion. That's good.But getting Map exceptions now. Given your comment about conventional NAS this all may be for naught. Let me describe my -planned- workflow:-export data from hdfs to local-dir (which is a directory on a lun off my Netapp filer)-copy to portable disk array, send to cloud provider-import to hdfs
Q:all Maps output to local dirs on each datanode?Q:20 dns writing to same lun will have multiple issues: -possible directory naming collisions? -bottleneck at controller on filer? I think yes.Q:i should just start using copytolocal now, hopefully it will complete by Monday am.
From: [EMAIL PROTECTED]
Date: Sat, 9 Mar 2013 14:00:52 -0500
Subject: Re: copytolocal vs distcp
To: [EMAIL PROTECTED]
Symbolic links can also help.
Note that this file system has to be visible with the same path on all hosts. You may also be bandwidth limited by whatever is serving that file system.
There are cases where you won't be limited by the file system. MapR, for instance, has a completely distributed NFS server and specialized file systems like lustre might also have distributed network traffic. If you are just writing to a conventional NAS, however, this is unlikely to win much relative to copytolocal simply due to bottlenecking.
On Sat, Mar 9, 2013 at 1:07 PM, John Meza <[EMAIL PROTECTED]> wrote:
I need suggestions on best methods of copying alot of data (~6Tb) from a cluster (20-dn) to the local file system.
While distcp has much more throughput compared to copytolocal (I think) because it uses MR jobs, it doesn't seem to work well with the following syntax <desturl> = "file://fs4/outdir/"
Problem: It puts in the home dir for the linux user. To get this to work I need to redefine the users home dir to the output dir (lun) with lotsa disk space.?
copytolocal is straightforward to use, but lacks the throughput (I think).