Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> distcp of small number of really large files

Copy link to this message
Re: distcp of small number of really large files
There is a new feature called concat(), which concatenates files consisting of full blocks.
So the ideas is to copy individual blocks in parallel, then concatenate them once they are
copied back into original files.
You will have to write some code to do this or modify distcp.
This is in 0.22/21, but not in 0.20.

On 5/17/2010 5:10 PM, Mridul Muralidharan wrote:
> Hi,
> Is there a way to parallelize copy of really large files ?
>  From my understanding, currently a each map in distcp copies one file.
> So for really large files, this would be pretty slow if number of files
> is really large.
> Thanks,
> Mridul