Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput

Himanish Kushary 2013-03-28, 03:54
Copy link to this message
Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput
The EMR distributions have special versions of the s3 file system.  They
might be helpful here.

Of course, you likely aren't running those if you are seeing 5MB/s.

An extreme alternative would be to light up an EMR cluster, copy to it,
then to S3.
On Thu, Mar 28, 2013 at 4:54 AM, Himanish Kushary <[EMAIL PROTECTED]>wrote:

> I am thinking either transferring individual folders instead of the entire
> 70 GB folders as a workaround or as another option increasing the "
> mapred.task.timeout" parameter to something like 6-7 hour ( as the avg
> rate of transfer to S3 seems to be 5 MB/s).Is there any other better
> option to increase the throughput for transferring bulk data from HDFS to
> S3 ?  Looking forward for suggestions.
David Parks 2013-03-28, 07:56
Himanish Kushary 2013-03-28, 10:51
David Parks 2013-03-29, 05:41
Himanish Kushary 2013-03-29, 13:18
David Parks 2013-03-29, 14:34