3) Similarly to 2, you could consider multithreading. So in each physical
node you would only to have the equivalent in memory of what is required
for a map while having the processing power of many. But it will depend on
your context ie how you are using the memory.
But 1) is really the key indeed : <number of slots per physical node> *
<maximum memory per slot> shouldn't be superior to what is available in
your physical node.
On Wed, Aug 22, 2012 at 8:03 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> There are two options I can think of now
> 1) If all your jobs are memory intensive I'd recommend you to adjust your
> task slots per node accordingly
> 2) If only a few jobs are memory intensive, you can think of each map task
> processing lesser volume of data. For that set mapred.max.splitsize to the
> maximum data chuck a map task can process with your current memory
> Bejoy KS
> Sent from handheld, please excuse typos.
> *From: * nutch buddy <[EMAIL PROTECTED]>
> *Date: *Wed, 22 Aug 2012 08:57:31 +0300
> *To: *<[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *Re: why is num of map tasks gets overridden?
> So what can I do If I have a given input, and my job needs a lot of memroy
> per map task?
> I can't control the amount of map tasks, and my total memory per machine
> is limited - I'll eventaully get each machine's memory full.
> On Tue, Aug 21, 2012 at 3:52 PM, Bertrand Dechoux <[EMAIL PROTECTED]>wrote:
>> Actually controlling the number of maps is subtle. The mapred.map.tasks
>>> parameter is just a hint to the InputFormat for the number of maps. The
>>> default InputFormat behavior is to split the total number of bytes into the
>>> right number of fragments. However, in the default case the DFS block size
>>> of the input files is treated as an upper bound for input splits. A lower
>>> bound on the split size can be set via mapred.min.split.size. Thus, if you
>>> expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k
>>> maps, unless your mapred.map.tasks is even larger. Ultimately the
>>> InputFormat<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html>determines the number of maps.
>> On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy <[EMAIL PROTECTED]>wrote:
>>> I configure a job in hadoop ,set the number of map tasks in the code to
>>> Then I run the job and it gets 152 map tasks. Can't get why its being
>>> overriden and whhere it get 152 from.
>>> The mapred-site.xml has 24 as mapred.map.tasks.
>>> any idea?
>> Bertrand Dechoux