Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Java Heap memory error : Limit to 2 Gb of ShuffleRamManager ?


+
Olivier Varene - echo 2012-12-06, 16:01
Copy link to this message
-
Re: Java Heap memory error : Limit to 2 Gb of ShuffleRamManager ?
Oliver,

 Sorry, missed this.

 The historical reason, if I remember right, is that we used to have a single byte buffer and hence the limit.

 We should definitely remove it now since we don't use a single buffer. Mind opening a jira?

 http://wiki.apache.org/hadoop/HowToContribute

thanks!
Arun

On Dec 6, 2012, at 8:01 AM, Olivier Varene - echo wrote:

> anyone ?
>
> Début du message réexpédié :
>
>> De : Olivier Varene - echo <[EMAIL PROTECTED]>
>> Objet : ReduceTask > ShuffleRamManager : Java Heap memory error
>> Date : 4 décembre 2012 09:34:06 HNEC
>> À : [EMAIL PROTECTED]
>> Répondre à : [EMAIL PROTECTED]
>>
>>
>> Hi to all,
>> first many thanks for the quality of the work you are doing : thanks a lot
>>
>> I am facing a bug with the memory management at shuffle time, I regularly get
>>
>> Map output copy failure : java.lang.OutOfMemoryError: Java heap space
>> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1612)
>>
>>
>> reading the code in org.apache.hadoop.mapred.ReduceTask.java file
>>
>> the "ShuffleRamManager" is limiting the maximum of RAM allocation to Integer.MAX_VALUE * maxInMemCopyUse ?
>>
>> maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes",
>>            (int)Math.min(Runtime.getRuntime().maxMemory(), Integer.MAX_VALUE))
>>          * maxInMemCopyUse);
>>
>> Why is is so ?
>> And why is it concatened to an Integer as its raw type is long ?
>>
>> Does it mean that you can not have a Reduce Task taking advantage of more than 2Gb of memory ?
>>
>> To explain a little bit my use case,
>> I am processing some 2700 maps (each working on 128 MB block of data), and when the reduce phase starts, it sometimes stumbles with java heap memory issues.
>>
>> configuration is : java 1.6.0-27
>> hadoop 0.20.2
>> -Xmx1400m
>> io.sort.mb 400
>> io.sort.factor 25
>> io.sort.spill.percent 0.80
>> mapred.job.shuffle.input.buffer.percent 0.70
>> ShuffleRamManager: MemoryLimit=913466944, MaxSingleShuffleLimit=228366736
>>
>> I will decrease
>> mapred.job.shuffle.input.buffer.percent to limit the errors, but I am not fully confident for the scalability of the process.
>>
>> Any help would be welcomed
>>
>> once again, many thanks
>> Olivier
>>
>>
>> P.S: sorry if I misunderstood the code, any explanation would be really welcomed
>>
>> --
>>  
>>  
>>  
>>
>>
>

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
+
Olivier Varene - echo 2012-12-06, 22:14
+
Olivier Varene - echo 2012-12-10, 13:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB