Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Insight on why distcp becomes slower when adding nodemanager


+
Alexandre Fouche 2012-10-29, 15:12
Copy link to this message
-
Re: Insight on why distcp becomes slower when adding nodemanager
On your second low-memory NM instance, did you ensure to lower the
yarn.nodemanager.resource.memory-mb property specifically to avoid
swapping due to excessive resource grants? The default offered is 8 GB
(>> 1.7 GB you have).

On Mon, Oct 29, 2012 at 8:42 PM, Alexandre Fouche
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> Can someone give some insight on why a "distcp" of 600 files of a few
> hundred bytes from s3n:// to local hdfs is taking 46s when using a
> yarn-nodemanager EC2 instance with 16GB memory (which by the way i think is
> jokingly long), and taking 3mn30s when adding a second yarn-nodemanager (a
> small instance with 1.7GB memory) ?
> I would have expected it to be a bit faster, not 5xlonger !
>
> I have the same issue when i stop the small instance nodemanager and restart
> it to join the processing after the big nodemanager instance was already
> submitted the job.
>
> I am using Cloudera latest Yarn+HDFS on Amazon (rebranded Centos 6)
>
>     #Staging 14:58:04 root@datanode2:hadoop-yarn: rpm -qa |grep hadoop
>     hadoop-hdfs-datanode-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>     hadoop-mapreduce-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>     hadoop-0.20-mapreduce-0.20.2+1261-1.cdh4.1.1.p0.4.el6.x86_64
>     hadoop-yarn-nodemanager-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>     hadoop-mapreduce-historyserver-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>     hadoop-hdfs-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>     hadoop-client-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>     hadoop-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>     hadoop-yarn-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>
>
>     #Staging 14:39:51 root@resourcemanager:hadoop-yarn:
> HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce time hadoop distcp -overwrite
> s3n://xxx:[EMAIL PROTECTED]ev/* hdfs:///tmp/something/a
>
>     12/10/29 14:40:12 INFO tools.DistCp: Input Options:
> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false,
> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
> copyStrategy='uniformsize', sourceFileListing=null,
> sourcePaths=[s3n://xxx:[EMAIL PROTECTED]ev/*],
> targetPath=hdfs:/tmp/something/a}
>     12/10/29 14:40:18 WARN conf.Configuration: io.sort.mb is deprecated.
> Instead, use mapreduce.task.io.sort.mb
>     12/10/29 14:40:18 WARN conf.Configuration: io.sort.factor is deprecated.
> Instead, use mapreduce.task.io.sort.factor
>     12/10/29 14:40:19 INFO mapreduce.JobSubmitter: number of splits:15
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.jar is deprecated.
> Instead, use mapreduce.job.jar
>     12/10/29 14:40:19 WARN conf.Configuration:
> mapred.map.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.map.speculative
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.reduce.tasks is
> deprecated. Instead, use mapreduce.job.reduces
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.value.class
> is deprecated. Instead, use mapreduce.map.output.value.class
>     12/10/29 14:40:19 WARN conf.Configuration: mapreduce.map.class is
> deprecated. Instead, use mapreduce.job.map.class
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.job.name is
> deprecated. Instead, use mapreduce.job.name
>     12/10/29 14:40:19 WARN conf.Configuration: mapreduce.inputformat.class
> is deprecated. Instead, use mapreduce.job.inputformat.class
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.output.dir is
> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
>     12/10/29 14:40:19 WARN conf.Configuration: mapreduce.outputformat.class
> is deprecated. Instead, use mapreduce.job.outputformat.class
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.map.tasks is
> deprecated. Instead, use mapreduce.job.maps
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.mapoutput.key.class is
> deprecated. Instead, use mapreduce.map.output.key.class
>     12/10/29 14:40:19 WARN conf.Configuration: mapred.working.dir is
> deprecated. Instead, use mapreduce.job.working.dir

Harsh J
+
Michael Segel 2012-10-29, 19:04
+
Alexandre Fouche 2012-10-31, 12:37
+
Michael Segel 2012-10-31, 19:23
+
Marcos Ortiz 2012-10-31, 20:27
+
Alexandre Fouche 2012-11-01, 20:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB