Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> why is num of map tasks gets overridden?


Copy link to this message
-
Re: why is num of map tasks gets overridden?
3) Similarly to 2, you could consider multithreading. So in each physical
node you would only to have the equivalent in memory of what is required
for a map while having the processing power of many. But it will depend on
your context ie how you are using the memory.

But 1) is really the key indeed : <number of slots per physical node> *
<maximum memory per slot> shouldn't be superior to what is available in
your physical node.

Regards

Bertrand

On Wed, Aug 22, 2012 at 8:03 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:

> **
> Hi
>
> There are two options I can think of now
>
> 1) If all your jobs are memory intensive I'd recommend you to adjust your
> task slots per node accordingly
> 2) If only a few jobs are memory intensive, you can think of each map task
> processing lesser volume of data. For that set mapred.max.splitsize to the
> maximum data chuck a map task can process with your current memory
> constrain.
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * nutch buddy <[EMAIL PROTECTED]>
> *Date: *Wed, 22 Aug 2012 08:57:31 +0300
> *To: *<[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *Re: why is num of map tasks gets overridden?
>
> So what can I do If I have a given input, and my job needs a lot of memroy
> per map task?
> I can't control the amount of map tasks, and my total memory per machine
> is limited - I'll eventaully get each machine's memory full.
>
> On Tue, Aug 21, 2012 at 3:52 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>
>> Actually controlling the number of maps is subtle. The mapred.map.tasks
>>> parameter is just a hint to the InputFormat for the number of maps. The
>>> default InputFormat behavior is to split the total number of bytes into the
>>> right number of fragments. However, in the default case the DFS block size
>>> of the input files is treated as an upper bound for input splits. A lower
>>> bound on the split size can be set via mapred.min.split.size. Thus, if you
>>> expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k
>>> maps, unless your mapred.map.tasks is even larger. Ultimately the
>>> InputFormat<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html>determines the number of maps.
>>>
>>
>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> Bertrand
>>
>>
>> On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy <[EMAIL PROTECTED]>wrote:
>>
>>> I configure a job in hadoop ,set the number of map tasks in the code to
>>> 8.
>>>
>>> Then I run the job and it gets 152 map tasks. Can't get why its being
>>> overriden and whhere it get 152 from.
>>>
>>> The mapred-site.xml has 24 as mapred.map.tasks.
>>>
>>> any idea?
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
--
Bertrand Dechoux