Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Setting number of parallel Reducers and Mappers for optimal performance


Copy link to this message
-
Re: Setting number of parallel Reducers and Mappers for optimal performance
Pavan,
 
On Aug 10, 2012, at 9:17 PM, Pavan Kulkarni wrote:

> Arun,
>
>  Thanks a lot for your response.
>
> I am running on a 16 core Xeon processor and 12 spindles.So running 12
> Mappers with 2G and 6 Reducers with 3G might give me the best
> performance.

Hmm... ok. You actually _may_ have enough CPU to drive slightly higher number of tasks.
You could also measure that against 16 maps with 1.5G each and 6 reduces with 3G each.

> Also is there a general formula to arrive at those numbers?
>

 I'll think about it, but as with all systems, Hadoop performance is some parts experience, some knowledge of workload, black-magic and simple measurement... *smile*

Arun

> On Fri, Aug 10, 2012 at 7:34 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>
>> Pavan,
>>
>> A very important factor is how much CPU and how many spindles you have...
>>
>> Your proposal for memory (44G in all) seems reasonable.
>>
>> However, if you have 12 spindles and sufficient CPU I'd do something like
>> 10 or 12 maps of 2G each and 6 reduces with 3G/4G each depending on how you
>> want to slice/dice your slots.
>>
>> Arun
>>
>> On Aug 10, 2012, at 1:24 PM, Pavan Kulkarni wrote:
>>
>>> Hi,
>>>
>>> I was trying to optimize Hadoop-1.0.2 performance by setting
>>> *mapred.tasktracker.map.tasks.maximum
>>> ,**mapred.tasktracker.reduce.tasks.maximum*
>>> such that the entire memory is utilized. The tuning of this parameter is
>>> given as (CPUS > 2) ? (CPUS * 0.50): 1 for reduce and (CPUS > 2) ? (CPUS
>> *
>>> 0.75): 1 for map.
>>> I didn't quite get how they made this suggestion ?  Isn't the setting
>>> dependent on  main memory available?
>>> For example I had 48GB of memory and I split the parameters as 32 for
>>> mappers and 12 for reducers and remaining 4 for OS and other processes.
>>> Please correct me if my assumption is wrong.Also suggest a way to get the
>>> optimal performance by setting these parameters. Thanks.
>>>
>>> --
>>>
>>> --With Regards
>>> Pavan Kulkarni
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>
>
> --
>
> --With Regards
> Pavan Kulkarni

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB