Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # dev - Setting number of parallel Reducers and Mappers for optimal performance


+
Pavan Kulkarni 2012-08-10, 20:24
+
Arun C Murthy 2012-08-11, 02:34
+
Pavan Kulkarni 2012-08-11, 04:17
Copy link to this message
-
Re: Setting number of parallel Reducers and Mappers for optimal performance
Arun C Murthy 2012-08-12, 04:37
Pavan,
 
On Aug 10, 2012, at 9:17 PM, Pavan Kulkarni wrote:

> Arun,
>
>  Thanks a lot for your response.
>
> I am running on a 16 core Xeon processor and 12 spindles.So running 12
> Mappers with 2G and 6 Reducers with 3G might give me the best
> performance.

Hmm... ok. You actually _may_ have enough CPU to drive slightly higher number of tasks.
You could also measure that against 16 maps with 1.5G each and 6 reduces with 3G each.

> Also is there a general formula to arrive at those numbers?
>

 I'll think about it, but as with all systems, Hadoop performance is some parts experience, some knowledge of workload, black-magic and simple measurement... *smile*

Arun

> On Fri, Aug 10, 2012 at 7:34 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>
>> Pavan,
>>
>> A very important factor is how much CPU and how many spindles you have...
>>
>> Your proposal for memory (44G in all) seems reasonable.
>>
>> However, if you have 12 spindles and sufficient CPU I'd do something like
>> 10 or 12 maps of 2G each and 6 reduces with 3G/4G each depending on how you
>> want to slice/dice your slots.
>>
>> Arun
>>
>> On Aug 10, 2012, at 1:24 PM, Pavan Kulkarni wrote:
>>
>>> Hi,
>>>
>>> I was trying to optimize Hadoop-1.0.2 performance by setting
>>> *mapred.tasktracker.map.tasks.maximum
>>> ,**mapred.tasktracker.reduce.tasks.maximum*
>>> such that the entire memory is utilized. The tuning of this parameter is
>>> given as (CPUS > 2) ? (CPUS * 0.50): 1 for reduce and (CPUS > 2) ? (CPUS
>> *
>>> 0.75): 1 for map.
>>> I didn't quite get how they made this suggestion ?  Isn't the setting
>>> dependent on  main memory available?
>>> For example I had 48GB of memory and I split the parameters as 32 for
>>> mappers and 12 for reducers and remaining 4 for OS and other processes.
>>> Please correct me if my assumption is wrong.Also suggest a way to get the
>>> optimal performance by setting these parameters. Thanks.
>>>
>>> --
>>>
>>> --With Regards
>>> Pavan Kulkarni
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>
>
> --
>
> --With Regards
> Pavan Kulkarni

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/