Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Insight on why distcp becomes slower when adding nodemanager


+
Alexandre Fouche 2012-10-29, 15:12
+
Harsh J 2012-10-29, 16:30
+
Michael Segel 2012-10-29, 19:04
+
Alexandre Fouche 2012-10-31, 12:37
+
Michael Segel 2012-10-31, 19:23
Copy link to this message
-
Re: Insight on why distcp becomes slower when adding nodemanager
Marcos Ortiz 2012-10-31, 20:27

On 10/31/2012 02:23 PM, Michael Segel wrote:
> Not sure.
>
> Lots of things can effect your throughput.
> Networking is my first guess. Which is why I asked about the number of
> times you've run the same test to see if there is a wide variation in
> timings.
>
> On Oct 31, 2012, at 7:37 AM, Alexandre Fouche
> <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>> These instances have no swap. I tried 5 or 6 times in a row, and
>> modified the yarn.nodemanager.resource.memory-mb but it did not help.
>> Later on, i'll replace the openjdk with the Oracle java SE 1.6.31 to
>> see if it improves overall performance.

How many RAM do you have, and how much of it  is assigned to your Hadoop
services?

>> Now i am running everything on medium instances for prototyping, and
>> while this is better, i still find it abusively slow. Maybe bad
>> hadoop performance on less than xlarge memory instances is to be
>> expected on EC2 ?
Are you using Hadoop on top of EC2 or are you using the EMR service?

>>
>>
>> --
>> Alexandre Fouche
>> Lead operations engineer, cloud architect
>> http://www.cleverscale.com | @cleverscale
>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>
>> On Monday 29 October 2012 at 20:04, Michael Segel wrote:
>>
>>> how many times did you test it?
>>>
>>> need to rule out aberrations.
>>>
>>> On Oct 29, 2012, at 11:30 AM, Harsh J <[EMAIL PROTECTED]
>>> <mailto:[EMAIL PROTECTED]>> wrote:
>>>
>>>> On your second low-memory NM instance, did you ensure to lower the
>>>> yarn.nodemanager.resource.memory-mb property specifically to avoid
>>>> swapping due to excessive resource grants? The default offered is 8 GB
>>>> (>> 1.7 GB you have).
>>>>
>>>> On Mon, Oct 29, 2012 at 8:42 PM, Alexandre Fouche
>>>> <[EMAIL PROTECTED]
>>>> <mailto:[EMAIL PROTECTED]>> wrote:
>>>>> Hi,
>>>>>
>>>>> Can someone give some insight on why a "distcp" of 600 files of a few
>>>>> hundred bytes from s3n:// to local hdfs is taking 46s when using a
>>>>> yarn-nodemanager EC2 instance with 16GB memory (which by the way i
>>>>> think is
>>>>> jokingly long), and taking 3mn30s when adding a second
>>>>> yarn-nodemanager (a
>>>>> small instance with 1.7GB memory) ?
>>>>> I would have expected it to be a bit faster, not 5xlonger !
>>>>>
>>>>> I have the same issue when i stop the small instance nodemanager
>>>>> and restart
>>>>> it to join the processing after the big nodemanager instance was
>>>>> already
>>>>> submitted the job.
>>>>>
>>>>> I am using Cloudera latest Yarn+HDFS on Amazon (rebranded Centos 6)
>>>>>
>>>>> #Staging 14:58:04 root@datanode2:hadoop-yarn: rpm -qa |grep hadoop
>>>>> hadoop-hdfs-datanode-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-mapreduce-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-0.20-mapreduce-0.20.2+1261-1.cdh4.1.1.p0.4.el6.x86_64
>>>>> hadoop-yarn-nodemanager-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-mapreduce-historyserver-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-hdfs-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-client-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>> hadoop-yarn-2.0.0+545-1.cdh4.1.1.p0.5.el6.x86_64
>>>>>
>>>>>
>>>>> #Staging 14:39:51 root@resourcemanager:hadoop-yarn:
>>>>> HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce time hadoop distcp
>>>>> -overwrite
>>>>> s3n://xxx:[EMAIL PROTECTED]ev <mailto:[EMAIL PROTECTED]ev>/*
>>>>> hdfs:///tmp/something/a
>>>>>
>>>>> 12/10/29 14:40:12 INFO tools.DistCp: Input Options:
>>>>> DistCpOptions{atomicCommit=false, syncFolder=false,
>>>>> deleteMissing=false,
>>>>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null',
>>>>> copyStrategy='uniformsize', sourceFileListing=null,
>>>>> sourcePaths=[s3n://xxx:[EMAIL PROTECTED]ev
>>>>> <mailto:[EMAIL PROTECTED]ev>/*],
>>>>> targetPath=hdfs:/tmp/something/a}
>>>>> 12/10/29 14:40:18 WARN conf.Configuration: io.sort.mb is deprecated.
>>>>> Instead, use mapreduce.task.io.sort.mb
Marcos Luis Ort�z Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
+
Alexandre Fouche 2012-11-01, 20:01