Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> About reducer's Shuffle JVM Heap Size


Copy link to this message
-
Re: About reducer's Shuffle JVM Heap Size
The task code was optimized from 32-bit jvm (yes, people use 64-bit
jvm for servers and 32-bit jvm for tasks in production), because it's
more memory efficient with the same Xmx, which by default is smaller
than 2GiB. Feel free to open a jira for improvement. I'd preserve the
optimization for 32-bit jvm by checking the os.arch property.

On Sat, Oct 27, 2012 at 7:35 AM, Lijie Xu <[EMAIL PROTECTED]> wrote:
> Hi, all.
> I'm debugging Hadoop's source code and find an incomprehensible setting.
> In hadoop-0.20.2 and hadoop-1.0.3, reducer's shuffle buffer size cannot
> exceed 2048MB (i.e., Integer.MAX_VALUE).  In this way, although*//*
> reducer's JVM size can be set more than 2048MB (e.g.,
> mapred.child.java.opts=-Xmx4000m), the heap size used for shuffle buffer is
> at most "2048MB * maxInMemCopyUse (default 0.7)" not "4000MB *
> maxInMemCopyUse".
> I think it's not a reasonable setting for large memory machines.
>
> The following code taken from "org.apache.hadoop.mapred.ReduceTask" shows
> the concrete algorithm.
> I'm wondering why maxSize is declared as "long" but set as "(int)". I point
> out the corresponding code  with "-->"
>
> Thanks for any suggestion.
> ---------------------------------------------------------------------------------------------------------------------------
>       private final long maxSize;
>       private final long maxSingleShuffleLimit;
>
>       private long size = 0;
>
>       private Object dataAvailable = new Object();
>       private long fullSize = 0;
>       private int numPendingRequests = 0;
>       private int numRequiredMapOutputs = 0;
>       private int numClosed = 0;
>       private boolean closed = false;
>
>       public ShuffleRamManager(Configuration conf) throws IOException {
>         final float maxInMemCopyUse > conf.getFloat("mapred.job.shuffle.input.buffer.percent", 0.70f);
>         if (maxInMemCopyUse > 1.0 || maxInMemCopyUse < 0.0) {
>           throw new IOException("mapred.job.shuffle.input.buffer.percent" +
>                                 maxInMemCopyUse);
>         }
>         // Allow unit tests to fix Runtime memory
> -->   maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes",
> -->        (int)Math.min(Runtime.getRuntime().maxMemory(),
> Integer.MAX_VALUE))
> -->      * maxInMemCopyUse);
>         maxSingleShuffleLimit = (long)(maxSize *
> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
>         LOG.info("ShuffleRamManager: MemoryLimit=" + maxSize +
>                  ", MaxSingleShuffleLimit=" + maxSingleShuffleLimit);
>       }
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB