Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - RE: How to configure multiple reduce jobs in hadoop 2.2.0


Copy link to this message
-
RE: How to configure multiple reduce jobs in hadoop 2.2.0
java8964 2014-01-17, 15:39
I read this blog, and have the following questions:
What is the relationship between "mapreduce.map.memory.mb" and "mapreduce.map.java.opts"?
In the blog, it gives the following settings as example:
For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.In mapred-site.xml:1234<name>mapreduce.map.memory.mb</name><value>4096</value><name>mapreduce.reduce.memory.mb</name><value>8192</value>Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.In mapred-site.xml:1234<name>mapreduce.map.java.opts</name><value>-Xmx3072m</value><name>mapreduce.reduce.java.opts</name><value>-Xmx6144m</value>The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use.
I am not sure why the "mapreduce.map.java.opts" should be lower than "mapreduce.map.memory.mb", as suggested above, or how it makes sense.
If the JVM of mapper task is set with heap size of Max 3G, and the Container for the map task max memory is set to 4G, then what is the usage of this additional 1G memory for?
Basically my questions are:
1) Why we have this 2 configuration settings? From what I thought, should one be enough?2) For the above settings, my understanding is that from application, the max memory I can use for mapper task is 3G, no matter what I asked for, right? Is the additional 1G meaning any size I can ask outside of the JVM Heap?
Thanks
Yong
Date: Fri, 17 Jan 2014 15:16:28 +0530
Subject: Re: How to configure multiple reduce jobs in hadoop 2.2.0
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

Also check this
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/

On Fri, Jan 17, 2014 at 2:56 PM, Silvina Caíno Lores <[EMAIL PROTECTED]> wrote:

Also, you should be limited by your container configuration at yarn-site.xml and mapred-site.xml, check THIS to understand how resource management works.

Basically you can set the number of reducers you want but you are limited to the number the system can actually hold by the configuration you have set.
Hope it helps.

Regards,Silvina

On 16 January 2014 08:54, sudhakara st <[EMAIL PROTECTED]> wrote:
Hello Ashish,

Using “-D mapreduce.job.reduces=number” with fixed number of reducer will spawn that many for a job.

On Thu, Jan 16, 2014 at 12:45 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:

Dear All,

I have a 3 node cluster and have a map reduce job running on it. I have 8 data blocks spread across all the 3 nodes. While running map reduce job I could see 8 map tasks running however reduce job is only 1. Is there a way to configure multiple reduce jobs?

--Ashish

--
      
Regards,...Sudhakara.st

                      
--
      
Regards,...Sudhakara.st