There is a new feature called concat(), which concatenates files consisting of full blocks.
So the ideas is to copy individual blocks in parallel, then concatenate them once they are
copied back into original files.
You will have to write some code to do this or modify distcp.
This is in 0.22/21, but not in 0.20.
On 5/17/2010 5:10 PM, Mridul Muralidharan wrote:
> Is there a way to parallelize copy of really large files ?
> From my understanding, currently a each map in distcp copies one file.
> So for really large files, this would be pretty slow if number of files
> is really large.