-RE: How to configure multiple reduce jobs in hadoop 2.2.0
The simple explanation is that a Java application is not just limited by
the heap size.
As an example, Tom Whites Hadoop The Definitive Guide, page 323: the jobs
own memory also includes native libraries, Javas permgen space, etc.
I encourage you to read more about memory management on Java applications
(not specifically for Hadoop).
From: java8964 [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 17, 2014 9:39 AM
To: [EMAIL PROTECTED]
Subject: RE: How to configure multiple reduce jobs in hadoop 2.2.0
I read this blog, and have the following questions:
What is the relationship between "mapreduce.map.memory.mb" and
In the blog, it gives the following settings as example:
For our example cluster, we have the minimum RAM for a Container
(yarn.scheduler.minimum-allocation-mb) = 2 GB. Well thus assign 4 GB for
Map task Containers, and 8 GB for Reduce tasks Containers.
Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size
should be set to lower than the Map and Reduce memory defined above, so that
they are within the bounds of the Container memory allocated by YARN.
The above settings configure the upper limit of the physical RAM that Map
and Reduce tasks will use.
I am not sure why the "mapreduce.map.java.opts" should be lower than
"mapreduce.map.memory.mb", as suggested above, or how it makes sense.
If the JVM of mapper task is set with heap size of Max 3G, and the Container
for the map task max memory is set to 4G, then what is the usage of this
additional 1G memory for?
Basically my questions are:
1) Why we have this 2 configuration settings? From what I thought, should
one be enough?
2) For the above settings, my understanding is that from application, the
max memory I can use for mapper task is 3G, no matter what I asked for,
right? Is the additional 1G meaning any size I can ask outside of the JVM
Date: Fri, 17 Jan 2014 15:16:28 +0530
Subject: Re: How to configure multiple reduce jobs in hadoop 2.2.0
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Also check this
On Fri, Jan 17, 2014 at 2:56 PM, Silvina Caíno Lores <[EMAIL PROTECTED]>
Also, you should be limited by your container configuration at yarn-site.xml
and mapred-site.xml, check THIS
understand how resource management works.
Basically you can set the number of reducers you want but you are limited to
the number the system can actually hold by the configuration you have set.
Hope it helps.
On 16 January 2014 08:54, sudhakara st <[EMAIL PROTECTED]> wrote:
Using -D mapreduce.job.reduces=number with fixed number of reducer will
spawn that many for a job.
On Thu, Jan 16, 2014 at 12:45 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:
I have a 3 node cluster and have a map reduce job running on it. I have 8
data blocks spread across all the 3 nodes. While running map reduce job I
could see 8 map tasks running however reduce job is only 1. Is there a way
to configure multiple reduce jobs?