Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Insight on why distcp becomes slower when adding nodemanager


Copy link to this message
-
Re: Insight on why distcp becomes slower when adding nodemanager

On 10/31/2012 02:23 PM, Michael Segel wrote:
> Not sure.
>
> Lots of things can effect your throughput.
> Networking is my first guess. Which is why I asked about the number of
> times you've run the same test to see if there is a wide variation in
> timings.
>
> On Oct 31, 2012, at 7:37 AM, Alexandre Fouche
> <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>> These instances have no swap. I tried 5 or 6 times in a row, and
>> modified the yarn.nodemanager.resource.memory-mb but it did not help.
>> Later on, i'll replace the openjdk with the Oracle java SE 1.6.31 to
>> see if it improves overall performance.

How many RAM do you have, and how much of it  is assigned to your Hadoop
services?

>> Now i am running everything on medium instances for prototyping, and
>> while this is better, i still find it abusively slow. Maybe bad
>> hadoop performance on less than xlarge memory instances is to be
>> expected on EC2 ?
Are you using Hadoop on top of EC2 or are you using the EMR service?

>>
>>
>> --
>> Alexandre Fouche
>> Lead operations engineer, cloud architect
>> http://www.cleverscale.com | @cleverscale
>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>
>> On Monday 29 October 2012 at 20:04, Michael Segel wrote:
>>
>>> how many times did you test it?
>>>
>>> need to rule out aberrations.
>>>
>>> On Oct 29, 2012, at 11:30 AM, Harsh J <[EMAIL PROTECTED]
>>> <mailto:[EMAIL PROTECTED]>> wrote:
>>>
>>>> On your second low-memory NM instance, did you ensure to lower the
>>>> yarn.nodemanager.resource.memory-mb property specifically to avoid
>>>> swapping due to excessive resource grants? The default offered is 8 GB
>>>> (>> 1.7 GB you have).
>>>>
>>>> On Mon, Oct 29, 2012 at 8:42 PM, Alexandre Fouche
>>>> <[EMAIL PROTECTED]
>>>> <mailto:[EMAIL PROTECTED]>> wrote:
>>>>> Hi,
>>>>>
>>>>> Can someone give some insight on why a "distcp" of 600 files of a few
>>>>> hundred bytes from s3n:// to local hdfs is taking 46s when using a
>>>>> yarn-nodemanager EC2 instance with 16GB memory (which by the way i
>>>>> think is
>>>>> jokingly long), and taking 3mn30s when adding a second
>>>>> yarn-nodemanager (a
>>>>> small instance with 1.7GB memory) ?
>>>>> I would have expected it to be a bit faster, not 5xlonger !
>>>>>
>>>>> I have the same issue when i stop the small instance nodemanager
>>>>> and restart
>>>>> it to join the processing after the big nodemanager instance was
>>>>> already
>>>>> submitted the job.
>>>>>
>>>>> I am using Cloudera latest Yarn+HDFS on Amazon (rebranded Centos 6)
>>>>>
>>>>> #Staging 14:58:04 root@datanode2:hadoop-yarn: rpm -qa |grep hadoop
>>>>> hadoop-hdfs-datanode-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-mapreduce-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-0.20-mapreduce-0.20.2+1261-1.cdh4.1.1.p0.4.el6.x86_64
>>>>> hadoop-yarn-nodemanager-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-mapreduce-historyserver-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-hdfs-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-client-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-yarn-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>>
>>>>>
>>>>> #Staging 14:39:51 root@resourcemanager:hadoop-yarn:
>>>>> HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce time hadoop distcp
>>>>> -overwrite
>>>>> s3n://xxx:[EMAIL PROTECTED]ev <mailto:[EMAIL PROTECTED]ev>/*
>>>>> hdfs:///tmp/something/a
>>>>>
>>>>> 12/10/29 14:40:12 INFO tools.DistCp: Input Options:
>>>>> DistCpOptions{atomicCommit=false, syncFolder=false,
>>>>> deleteMissing=false,
>>>>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>>>>> copyStrategy='uniformsize', sourceFileListing=null,
>>>>> sourcePaths=[s3n://xxx:[EMAIL PROTECTED]ev
>>>>> <mailto:[EMAIL PROTECTED]ev>/*],
>>>>> targetPath=hdfs:/tmp/something/a}
>>>>> 12/10/29 14:40:18 WARN conf.Configuration: io.sort.mb is deprecated.
>>>>> Instead, use mapreduce.task.io.sort.mb
Marcos Luis Ort�z Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB