Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: Need help optimizing reducer


Copy link to this message
-
Re: Need help optimizing reducer
Mahesh Balija 2013-03-05, 09:00
The reason why the reducer is fast upto 66% is be because of the Sorting
and Shuffling phase of the reduce and when the actual task is NOT yet
started.

The reduce side is divided into 3 phases of 33~% each -> shuffle (fetch
data), sort and finally user-code (reduce). That is why your reduce might
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.

Best,
Mahesh Balija,
Calsoft Labs.

On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I have 1 reducer and I have around 600 thousand unique keys coming to it.
> The total data is only around 30 mb.
> My logic doesn't allow me to have more than 1 reducer.
> It's taking too long to complete, around 2 hours. (till 66% it's fast then
> it slows down/ I don't really think it has started doing anything till 66%
> but then why does it show like that?).
> Are there any job execution parameters that can help improve reducer
> performace?
> Any suggestions to improve things when we have to live with just one
> reducer?
>
> thanks,
> Austin
>
+
Fatih Haltas 2013-03-05, 09:46
+
Fatih Haltas 2013-03-05, 09:59