Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput


+
Himanish Kushary 2013-03-28, 03:54
Copy link to this message
-
Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput
The EMR distributions have special versions of the s3 file system.  They
might be helpful here.

Of course, you likely aren't running those if you are seeing 5MB/s.

An extreme alternative would be to light up an EMR cluster, copy to it,
then to S3.
On Thu, Mar 28, 2013 at 4:54 AM, Himanish Kushary <[EMAIL PROTECTED]>wrote:

> I am thinking either transferring individual folders instead of the entire
> 70 GB folders as a workaround or as another option increasing the "
> mapred.task.timeout" parameter to something like 6-7 hour ( as the avg
> rate of transfer to S3 seems to be 5 MB/s).Is there any other better
> option to increase the throughput for transferring bulk data from HDFS to
> S3 ?  Looking forward for suggestions.
>
+
David Parks 2013-03-28, 07:56
+
Himanish Kushary 2013-03-28, 10:51
+
David Parks 2013-03-29, 05:41
+
Himanish Kushary 2013-03-29, 13:18
+
David Parks 2013-03-29, 14:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB