Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - RE: How to configure multiple reduce jobs in hadoop 2.2.0


Copy link to this message
-
RE: How to configure multiple reduce jobs in hadoop 2.2.0
German Florez-Larrahondo 2014-01-17, 15:48
Yong

 

The simple explanation is that a Java application  is not just limited by
the heap size.

As an example, Tom White’s Hadoop The Definitive Guide, page 323: the job’s
own memory also includes native libraries, Java’s permgen space, etc.  

 

 

http://books.google.com/books?id=Wu_xeGdU4G8C
<http://books.google.com/books?id=Wu_xeGdU4G8C&pg=PA645&lpg=PA645&dq=mapredu
ce.map.java.opts++hadoop+the+definitive+guide&source=bl&ots=i7BVYDRcSv&sig=e
ZIrK5DfjFYUSncaNR7m1-Ao5Mo&hl=en&sa=X&ei=A1DZUs_8H7OksQTTrYCYBw&ved=0CCgQ6AE
wAA#v=onepage&q=mapreduce.map.java.opts%20%20hadoop%20the%20definitive%20gui
de&f=false>
&pg=PA645&lpg=PA645&dq=mapreduce.map.java.opts++hadoop+the+definitive+guide&
source=bl&ots=i7BVYDRcSv&sig=eZIrK5DfjFYUSncaNR7m1-Ao5Mo&hl=en&sa=X&ei=A1DZU
s_8H7OksQTTrYCYBw&ved=0CCgQ6AEwAA#v=onepage&q=mapreduce.map.java.opts%20%20h
adoop%20the%20definitive%20guide&f=false

 

I encourage you to read more about memory management on Java applications
(not specifically for Hadoop).

 

Regards

./g

 

From: java8964 [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 17, 2014 9:39 AM
To: [EMAIL PROTECTED]
Subject: RE: How to configure multiple reduce jobs in hadoop 2.2.0

 

I read this blog, and have the following questions:

 

What is the relationship between "mapreduce.map.memory.mb" and
"mapreduce.map.java.opts"?

 

In the blog, it gives the following settings as example:

 

For our example cluster, we have the minimum RAM for a Container
(yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for
Map task Containers, and 8 GB for Reduce tasks Containers.

In mapred-site.xml:
1

2

3

4

<name>mapreduce.map.memory.mb</name>

<value>4096</value>

<name>mapreduce.reduce.memory.mb</name>

<value>8192</value>

Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size
should be set to lower than the Map and Reduce memory defined above, so that
they are within the bounds of the Container memory allocated by YARN.

In mapred-site.xml:
1

2

3

4

<name>mapreduce.map.java.opts</name>

<value>-Xmx3072m</value>

<name>mapreduce.reduce.java.opts</name>

<value>-Xmx6144m</value>

The above settings configure the upper limit of the physical RAM that Map
and Reduce tasks will use.

 

I am not sure why the "mapreduce.map.java.opts" should be lower than
"mapreduce.map.memory.mb", as suggested above, or how it makes sense.

 

If the JVM of mapper task is set with heap size of Max 3G, and the Container
for the map task max memory is set to 4G, then what is the usage of this
additional 1G memory for?

 

Basically my questions are:

 

1) Why we have this 2 configuration settings? From what I thought, should
one be enough?

2) For the above settings, my understanding is that from application, the
max memory I can use for mapper task is 3G, no matter what I asked for,
right? Is the additional 1G meaning any size I can ask outside of the JVM
Heap?

 

Thanks

 

Yong

 

  _____  

Date: Fri, 17 Jan 2014 15:16:28 +0530
Subject: Re: How to configure multiple reduce jobs in hadoop 2.2.0
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

Also check this
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-o
perators/

 

On Fri, Jan 17, 2014 at 2:56 PM, Silvina Caíno Lores <[EMAIL PROTECTED]>
wrote:

Also, you should be limited by your container configuration at yarn-site.xml
and mapred-site.xml, check THIS
<http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/>  to
understand how resource management works.

 

Basically you can set the number of reducers you want but you are limited to
the number the system can actually hold by the configuration you have set.

 

Hope it helps.

Regards,

Silvina

 

On 16 January 2014 08:54, sudhakara st <[EMAIL PROTECTED]> wrote:

Hello Ashish,

Using “-D mapreduce.job.reduces=number” with fixed number of reducer will
spawn that many for a job.

 

On Thu, Jan 16, 2014 at 12:45 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:

Dear All,

I have a 3 node cluster and have a map reduce job running on it. I have 8
data blocks spread across all the 3 nodes. While running map reduce job I
could see 8 map tasks running however reduce job is only 1. Is there a way
to configure multiple reduce jobs?
      
Regards,
...Sudhakara.st
                      

 

      
Regards,
...Sudhakara.st