-Re: Setting number of parallel Reducers and Mappers for optimal performance
Arun C Murthy 2012-08-12, 04:37
On Aug 10, 2012, at 9:17 PM, Pavan Kulkarni wrote:
> Thanks a lot for your response.
> I am running on a 16 core Xeon processor and 12 spindles.So running 12
> Mappers with 2G and 6 Reducers with 3G might give me the best
Hmm... ok. You actually _may_ have enough CPU to drive slightly higher number of tasks.
You could also measure that against 16 maps with 1.5G each and 6 reduces with 3G each.
> Also is there a general formula to arrive at those numbers?
I'll think about it, but as with all systems, Hadoop performance is some parts experience, some knowledge of workload, black-magic and simple measurement... *smile*
> On Fri, Aug 10, 2012 at 7:34 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
>> A very important factor is how much CPU and how many spindles you have...
>> Your proposal for memory (44G in all) seems reasonable.
>> However, if you have 12 spindles and sufficient CPU I'd do something like
>> 10 or 12 maps of 2G each and 6 reduces with 3G/4G each depending on how you
>> want to slice/dice your slots.
>> On Aug 10, 2012, at 1:24 PM, Pavan Kulkarni wrote:
>>> I was trying to optimize Hadoop-1.0.2 performance by setting
>>> such that the entire memory is utilized. The tuning of this parameter is
>>> given as (CPUS > 2) ? (CPUS * 0.50): 1 for reduce and (CPUS > 2) ? (CPUS
>>> 0.75): 1 for map.
>>> I didn't quite get how they made this suggestion ? Isn't the setting
>>> dependent on main memory available?
>>> For example I had 48GB of memory and I split the parameters as 32 for
>>> mappers and 12 for reducers and remaining 4 for OS and other processes.
>>> Please correct me if my assumption is wrong.Also suggest a way to get the
>>> optimal performance by setting these parameters. Thanks.
>>> --With Regards
>>> Pavan Kulkarni
>> Arun C. Murthy
>> Hortonworks Inc.
> --With Regards
> Pavan Kulkarni
Arun C. Murthy