Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Sent from remote device, Please excuse typos
From: Mohammad Tariq <[EMAIL PROTECTED]>
Date: Wed, 17 Apr 2013 10:46:27
To: [EMAIL PROTECTED]<[EMAIL PROTECTED]>; Bejoy Ks<[EMAIL PROTECTED]>
Subject: Re: How to balance reduce job
Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?
On Wed, Apr 17, 2013 at 10:39 AM, <[EMAIL PROTECTED]> wrote:
> Hi Rauljin
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Bejoy KS
> Sent from remote device, Please excuse typos
> *From: * rauljin <[EMAIL PROTECTED]>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>
> *ReplyTo: * [EMAIL PROTECTED]
> *Subject: *How to balance reduce job
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
> Any ideas?