Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: How to balance reduce job


+
Ajay Srivastava 2013-04-17, 06:02
+
bejoy.hadoop@... 2013-04-17, 06:18
Copy link to this message
-
Re: How to balance reduce job
The number of reducer running depends on the data available.

*Thanks & Regards    *


Shashwat Shriparv

On Tue, May 7, 2013 at 8:43 PM, Tony Burton <[EMAIL PROTECTED]>wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)****
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> ****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> *Sent:* 17 April 2013 07:19
> *To:* [EMAIL PROTECTED]
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job****
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Ajay Srivastava <[EMAIL PROTECTED]> ****
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +0000****
>
> *To: *<[EMAIL PROTECTED]><[EMAIL PROTECTED]>; <
> [EMAIL PROTECTED]><[EMAIL PROTECTED]>****
>
> *ReplyTo: *[EMAIL PROTECTED] ****
>
> *Cc: *Mohammad Tariq<[EMAIL PROTECTED]>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Tariq probably meant distribution of keys from <key, value> pair emitted
> by mapper.****
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.****
>
> ** **
>
> ** **
>
> ** **
>
> Regards,****
>
> Ajay Srivastava****
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, <[EMAIL PROTECTED]>****
>
>  <[EMAIL PROTECTED]> wrote:****
>
>
>
> ****
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Mohammad Tariq <[EMAIL PROTECTED]> ****
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530****
>
> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>; Bejoy Ks<
> [EMAIL PROTECTED]>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?****
>
>
> ****
>
> Warm Regards, ****
>
> Tariq****
>
> https://mtariq.jux.com/****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM, <[EMAIL PROTECTED]> wrote:****
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *rauljin <[EMAIL PROTECTED]> ****
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800****
>
> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>****
>
> *ReplyTo: *[EMAIL PROTECTED] ****
>
> *Subject: *How to balance reduce job****
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2