Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: How to balance reduce job


Copy link to this message
-
Re: How to balance reduce job
The number of reducer running depends on the data available.

*Thanks & Regards    *


Shashwat Shriparv

On Tue, May 7, 2013 at 8:43 PM, Tony Burton <[EMAIL PROTECTED]>wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)****
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> ****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> *Sent:* 17 April 2013 07:19
> *To:* [EMAIL PROTECTED]
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job****
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Ajay Srivastava <[EMAIL PROTECTED]> ****
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +0000****
>
> *To: *<[EMAIL PROTECTED]><[EMAIL PROTECTED]>; <
> [EMAIL PROTECTED]><[EMAIL PROTECTED]>****
>
> *ReplyTo: *[EMAIL PROTECTED] ****
>
> *Cc: *Mohammad Tariq<[EMAIL PROTECTED]>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Tariq probably meant distribution of keys from <key, value> pair emitted
> by mapper.****
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.****
>
> ** **
>
> ** **
>
> ** **
>
> Regards,****
>
> Ajay Srivastava****
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, <[EMAIL PROTECTED]>****
>
>  <[EMAIL PROTECTED]> wrote:****
>
>
>
> ****
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Mohammad Tariq <[EMAIL PROTECTED]> ****
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530****
>
> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>; Bejoy Ks<
> [EMAIL PROTECTED]>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?****
>
>
> ****
>
> Warm Regards, ****
>
> Tariq****
>
> https://mtariq.jux.com/****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM, <[EMAIL PROTECTED]> wrote:****
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *rauljin <[EMAIL PROTECTED]> ****
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800****
>
> *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]>****
>
> *ReplyTo: *[EMAIL PROTECTED] ****
>
> *Subject: *How to balance reduce job****
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB