Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # dev >> About reducer's Shuffle JVM Heap Size


Copy link to this message
-
About reducer's Shuffle JVM Heap Size
Hi, all.
I'm debugging Hadoop's source code and find an incomprehensible setting.
In hadoop-0.20.2 and hadoop-1.0.3, reducer's shuffle buffer size cannot
exceed 2048MB (i.e., Integer.MAX_VALUE).  In this way, although*//*
reducer's JVM size can be set more than 2048MB (e.g.,
mapred.child.java.opts=-Xmx4000m), the heap size used for shuffle buffer
is at most "2048MB * maxInMemCopyUse (default 0.7)" not "4000MB *
maxInMemCopyUse".
I think it's not a reasonable setting for large memory machines.

The following code taken from "org.apache.hadoop.mapred.ReduceTask"
shows the concrete algorithm.
I'm wondering why maxSize is declared as "long" but set as "(int)". I
point out the corresponding code  with "-->"

Thanks for any suggestion.
---------------------------------------------------------------------------------------------------------------------------
       private final long maxSize;
       private final long maxSingleShuffleLimit;

       private long size = 0;

       private Object dataAvailable = new Object();
       private long fullSize = 0;
       private int numPendingRequests = 0;
       private int numRequiredMapOutputs = 0;
       private int numClosed = 0;
       private boolean closed = false;

       public ShuffleRamManager(Configuration conf) throws IOException {
         final float maxInMemCopyUse conf.getFloat("mapred.job.shuffle.input.buffer.percent", 0.70f);
         if (maxInMemCopyUse > 1.0 || maxInMemCopyUse < 0.0) {
           throw new IOException("mapred.job.shuffle.input.buffer.percent" +
                                 maxInMemCopyUse);
         }
         // Allow unit tests to fix Runtime memory
-->   maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes",
-->        (int)Math.min(Runtime.getRuntime().maxMemory(),
Integer.MAX_VALUE))
-->      * maxInMemCopyUse);
         maxSingleShuffleLimit = (long)(maxSize *
MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
         LOG.info("ShuffleRamManager: MemoryLimit=" + maxSize +
                  ", MaxSingleShuffleLimit=" + maxSingleShuffleLimit);
       }
+
Luke Lu 2012-11-02, 22:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB