|
Vitaliy Semochkin
2010-07-28, 19:24
Ted Yu
2010-07-28, 19:33
Joe Stein
2010-07-28, 19:40
Abhinay Mehta
2010-07-29, 10:31
Vitaliy Semochkin
2010-07-29, 11:20
Joe Stein
2010-07-29, 12:54
Raj V
2010-07-29, 16:37
|
-
what affects number of reducers launched by hadoop?Vitaliy Semochkin 2010-07-28, 19:24
Hi,
in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 however during monitoring the job in job tracker I see only 1 reducer working first it is reduce > copy - can someone please explain what does this mean? after it is reduce > reduce when I set the number of reduce tasks for a job programatically to 10 job.setNumReduceTasks(10); the number of "reduce > reduce" reducers increases to 10 and the performance of application increases as well (the number of reducers never exceeds). Can someone explain such behavior? Thanks in Advance, Vitaliy S
-
Re: what affects number of reducers launched by hadoop?Ted Yu 2010-07-28, 19:33
The 3 stages for reducer are:
copy sort reduce On Wed, Jul 28, 2010 at 12:24 PM, Vitaliy Semochkin <[EMAIL PROTECTED]>wrote: > Hi, > > in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 > however during monitoring the job in job tracker I see only 1 reducer > working > > first it is > reduce > copy - can someone please explain what does this mean? > > after it is > reduce > reduce > > when I set the number of reduce tasks for a job programatically to 10 > job.setNumReduceTasks(10); > the number of "reduce > reduce" reducers increases to 10 and the > performance of application increases as well (the number of reducers > never exceeds). > > Can someone explain such behavior? > > Thanks in Advance, > Vitaliy S >
-
Re: what affects number of reducers launched by hadoop?Joe Stein 2010-07-28, 19:40
mapred.tasktracker.reduce.tasks.maximum is how many you want as a ceiling
per node you need to configure *mapred.reduce.tasks* to be more than one as it is defaulted to 1 (which you are overriding in your code which is why it works there) This value should be somewhere between .95 and 1.75 times the number of maximum tasks per node times the number of data nodes. So if you have 3 data nodes and it is setup max tasks of 7 then configure this between 25 and 36 On Wed, Jul 28, 2010 at 3:24 PM, Vitaliy Semochkin <[EMAIL PROTECTED]>wrote: > Hi, > > in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 > however during monitoring the job in job tracker I see only 1 reducer > working > > first it is > reduce > copy - can someone please explain what does this mean? > > after it is > reduce > reduce > > when I set the number of reduce tasks for a job programatically to 10 > job.setNumReduceTasks(10); > the number of "reduce > reduce" reducers increases to 10 and the > performance of application increases as well (the number of reducers > never exceeds). > > Can someone explain such behavior? > > Thanks in Advance, > Vitaliy S > -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop */
-
Re: what affects number of reducers launched by hadoop?Abhinay Mehta 2010-07-29, 10:31
Which configuration key controls "the number of maximum tasks per node" ?
On 28 July 2010 20:40, Joe Stein <[EMAIL PROTECTED]> wrote: > mapred.tasktracker.reduce.tasks.maximum is how many you want as a ceiling > per node > > you need to configure *mapred.reduce.tasks* to be more than one as it is > defaulted to 1 (which you are overriding in your code which is why it works > there) > > This value should be somewhere between .95 and 1.75 times the number of > maximum tasks per node times the number of data nodes. > > So if you have 3 data nodes and it is setup max tasks of 7 then configure > this between 25 and 36 > > On Wed, Jul 28, 2010 at 3:24 PM, Vitaliy Semochkin <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 > > however during monitoring the job in job tracker I see only 1 reducer > > working > > > > first it is > > reduce > copy - can someone please explain what does this mean? > > > > after it is > > reduce > reduce > > > > when I set the number of reduce tasks for a job programatically to 10 > > job.setNumReduceTasks(10); > > the number of "reduce > reduce" reducers increases to 10 and the > > performance of application increases as well (the number of reducers > > never exceeds). > > > > Can someone explain such behavior? > > > > Thanks in Advance, > > Vitaliy S > > > > > > -- > > /* > Joe Stein > http://www.linkedin.com/in/charmalloc > Twitter: @allthingshadoop > */ >
-
Re: what affects number of reducers launched by hadoop?Vitaliy Semochkin 2010-07-29, 11:20
mapred.tasktracker.reduce.tasks.maximum
PS I found this documet of default values very useful http://hadoop.apache.org/common/docs/r0.18.3/hadoop-default.html however I failed to find it's new version for 0.20.2 Regards, Vitaliy S On Thu, Jul 29, 2010 at 2:31 PM, Abhinay Mehta <[EMAIL PROTECTED]> wrote: > Which configuration key controls "the number of maximum tasks per node" ? > > > On 28 July 2010 20:40, Joe Stein <[EMAIL PROTECTED]> wrote: > >> mapred.tasktracker.reduce.tasks.maximum is how many you want as a ceiling >> per node >> >> you need to configure *mapred.reduce.tasks* to be more than one as it is >> defaulted to 1 (which you are overriding in your code which is why it works >> there) >> >> This value should be somewhere between .95 and 1.75 times the number of >> maximum tasks per node times the number of data nodes. >> >> So if you have 3 data nodes and it is setup max tasks of 7 then configure >> this between 25 and 36 >> >> On Wed, Jul 28, 2010 at 3:24 PM, Vitaliy Semochkin <[EMAIL PROTECTED] >> >wrote: >> >> > Hi, >> > >> > in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 >> > however during monitoring the job in job tracker I see only 1 reducer >> > working >> > >> > first it is >> > reduce > copy - can someone please explain what does this mean? >> > >> > after it is >> > reduce > reduce >> > >> > when I set the number of reduce tasks for a job programatically to 10 >> > job.setNumReduceTasks(10); >> > the number of "reduce > reduce" reducers increases to 10 and the >> > performance of application increases as well (the number of reducers >> > never exceeds). >> > >> > Can someone explain such behavior? >> > >> > Thanks in Advance, >> > Vitaliy S >> > >> >> >> >> -- >> >> /* >> Joe Stein >> http://www.linkedin.com/in/charmalloc >> Twitter: @allthingshadoop >> */ >> >
-
Re: what affects number of reducers launched by hadoop?Joe Stein 2010-07-29, 12:54
there is no setting but the max tasks would be how many you set for map &
reduce tasks per node (so if you set 7 for map and 6 for reduce then you will not have more than 13 tasks running on the node as a result of the 2 settings). http://hadoop.apache.org/common/docs/current/cluster_setup.html You can also set the max num tasks for your JVM so that it will reuse JVM for crunching http://books.google.com/books?id=bKPEwR-Pt6EC&pg=PA170&lpg=PA170&dq=tom+white+hadoop+jvm&source=bl&ots=kOew2vedyn&sig=oHDtBJQYRbqN06y7ulq7crdvTRs&hl=en&ei=_3hRTJ7UMZTe4AaaoazrAw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBIQ6AEwAA#v=onepage&q&f=false you need to kind of balance RAM & CPU with everything you are doing with setting these and try to get the most from your config to bang on the box. Tom White's book has a good reference on this (and everything else) too. here are a couple tips & tricks you might find helpful in your first cluster http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/ On Thu, Jul 29, 2010 at 6:31 AM, Abhinay Mehta <[EMAIL PROTECTED]>wrote: > Which configuration key controls "the number of maximum tasks per node" ? > > > On 28 July 2010 20:40, Joe Stein <[EMAIL PROTECTED]> wrote: > > > mapred.tasktracker.reduce.tasks.maximum is how many you want as a ceiling > > per node > > > > you need to configure *mapred.reduce.tasks* to be more than one as it is > > defaulted to 1 (which you are overriding in your code which is why it > works > > there) > > > > This value should be somewhere between .95 and 1.75 times the number of > > maximum tasks per node times the number of data nodes. > > > > So if you have 3 data nodes and it is setup max tasks of 7 then configure > > this between 25 and 36 > > > > On Wed, Jul 28, 2010 at 3:24 PM, Vitaliy Semochkin <[EMAIL PROTECTED] > > >wrote: > > > > > Hi, > > > > > > in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 > > > however during monitoring the job in job tracker I see only 1 reducer > > > working > > > > > > first it is > > > reduce > copy - can someone please explain what does this mean? > > > > > > after it is > > > reduce > reduce > > > > > > when I set the number of reduce tasks for a job programatically to 10 > > > job.setNumReduceTasks(10); > > > the number of "reduce > reduce" reducers increases to 10 and the > > > performance of application increases as well (the number of reducers > > > never exceeds). > > > > > > Can someone explain such behavior? > > > > > > Thanks in Advance, > > > Vitaliy S > > > > > > > > > > > -- > > > > /* > > Joe Stein > > http://www.linkedin.com/in/charmalloc > > Twitter: @allthingshadoop > > */ > > > -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop */
-
Re: what affects number of reducers launched by hadoop?Raj V 2010-07-29, 16:37
Vitaliy,
Here are the default values and parameters for the 0.20.2 http://hadoop.apache.org/common/docs/r0.20.2/core-default.html http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html The default values in the XML format is available in the source tree. -regards Raj ________________________________ From: Vitaliy Semochkin <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thu, July 29, 2010 4:20:29 AM Subject: Re: what affects number of reducers launched by hadoop? mapred.tasktracker.reduce.tasks.maximum PS I found this documet of default values very useful http://hadoop.apache.org/common/docs/r0.18.3/hadoop-default.html however I failed to find it's new version for 0.20.2 Regards, Vitaliy S On Thu, Jul 29, 2010 at 2:31 PM, Abhinay Mehta <[EMAIL PROTECTED]> wrote: > Which configuration key controls "the number of maximum tasks per node" ? > > > On 28 July 2010 20:40, Joe Stein <[EMAIL PROTECTED]> wrote: > >> mapred.tasktracker.reduce.tasks.maximum is how many you want as a ceiling >> per node >> >> you need to configure *mapred.reduce.tasks* to be more than one as it is >> defaulted to 1 (which you are overriding in your code which is why it works >> there) >> >> This value should be somewhere between .95 and 1.75 times the number of >> maximum tasks per node times the number of data nodes. >> >> So if you have 3 data nodes and it is setup max tasks of 7 then configure >> this between 25 and 36 >> >> On Wed, Jul 28, 2010 at 3:24 PM, Vitaliy Semochkin <[EMAIL PROTECTED] >> >wrote: >> >> > Hi, >> > >> > in my cluster mapred.tasktracker.reduce.tasks.maximum = 4 >> > however during monitoring the job in job tracker I see only 1 reducer >> > working >> > >> > first it is >> > reduce > copy - can someone please explain what does this mean? >> > >> > after it is >> > reduce > reduce >> > >> > when I set the number of reduce tasks for a job programatically to 10 >> > job.setNumReduceTasks(10); >> > the number of "reduce > reduce" reducers increases to 10 and the >> > performance of application increases as well (the number of reducers >> > never exceeds). >> > >> > Can someone explain such behavior? >> > >> > Thanks in Advance, >> > Vitaliy S >> > >> >> >> >> -- >> >> /* >> Joe Stein >> http://www.linkedin.com/in/charmalloc >> Twitter: @allthingshadoop >> */ >> > |