|
Mohit Anchlia
2013-02-01, 22:42
Harsha
2013-02-01, 22:44
Mohit Anchlia
2013-02-01, 22:54
Harsha
2013-02-01, 23:15
Mohit Anchlia
2013-02-02, 00:53
Alan Gates
2013-02-02, 01:04
Mohit Anchlia
2013-02-02, 01:07
Rohini Palaniswamy
2013-02-06, 21:30
|
-
Reduce TasksMohit Anchlia 2013-02-01, 22:42
Is there a way to specify max number of reduce tasks that a job should span
in pig script without having to restart the cluster?
-
Re: Reduce TasksHarsha 2013-02-01, 22:44
Mohit,
you can use PARALLEL clause to specify reduce tasks. More info here http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features -- Harsha On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: > Is there a way to specify max number of reduce tasks that a job should span > in pig script without having to restart the cluster? > >
-
Re: Reduce TasksMohit Anchlia 2013-02-01, 22:54
Thanks! Is there a downside of reducing number of reducers? I am trying to
alleviate high CPU. With low reducers using parallel clause does it mean that more data is processed by each reducer or does it mean how many reducers can be active at one time On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED]> wrote: > Mohit, > you can use PARALLEL clause to specify reduce tasks. More info here > http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features > > -- > Harsha > > > On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: > > > Is there a way to specify max number of reduce tasks that a job should > span > > in pig script without having to restart the cluster? > > > > > > >
-
Re: Reduce TasksHarsha 2013-02-01, 23:15
its the total number of reducers not active reducers. If you specify lower number each reducer gets more data to process. -- Harsha On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote: > Thanks! Is there a downside of reducing number of reducers? I am trying to > alleviate high CPU. > > With low reducers using parallel clause does it mean that more data is > processed by each reducer or does it mean how many reducers can be active > at one time > > On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > Mohit, > > you can use PARALLEL clause to specify reduce tasks. More info here > > http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features > > > > -- > > Harsha > > > > > > On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: > > > > > Is there a way to specify max number of reduce tasks that a job should > > span > > > in pig script without having to restart the cluster? > > > > > > >
-
Re: Reduce TasksMohit Anchlia 2013-02-02, 00:53
Just slightly different problem I tried setting SET mapred.reduce.tasks to
200 in pig but still more tasks were launched for that job. Is there any other way to set the parameter? On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[EMAIL PROTECTED]> wrote: > > its the total number of reducers not active reducers. > If you specify lower number each reducer gets more data to process. > -- > Harsha > > > On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote: > > > Thanks! Is there a downside of reducing number of reducers? I am trying > to > > alleviate high CPU. > > > > With low reducers using parallel clause does it mean that more data is > > processed by each reducer or does it mean how many reducers can be active > > at one time > > > > On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto: > [EMAIL PROTECTED])> wrote: > > > > > Mohit, > > > you can use PARALLEL clause to specify reduce tasks. More info here > > > > http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features > > > > > > -- > > > Harsha > > > > > > > > > On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: > > > > > > > Is there a way to specify max number of reduce tasks that a job > should > > > span > > > > in pig script without having to restart the cluster? > > > > > > > > > > > > > > >
-
Re: Reduce TasksAlan Gates 2013-02-02, 01:04
Setting that mapred.reduce.tasks won't work as Pig overrides. See http://pig.apache.org/docs/r0.10.0/perf.html#parallel for info on how to set the number of reducers in Pig.
Alan. On Feb 1, 2013, at 4:53 PM, Mohit Anchlia wrote: > Just slightly different problem I tried setting SET mapred.reduce.tasks to > 200 in pig but still more tasks were launched for that job. Is there any > other way to set the parameter? > > On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[EMAIL PROTECTED]> wrote: > >> >> its the total number of reducers not active reducers. >> If you specify lower number each reducer gets more data to process. >> -- >> Harsha >> >> >> On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote: >> >>> Thanks! Is there a downside of reducing number of reducers? I am trying >> to >>> alleviate high CPU. >>> >>> With low reducers using parallel clause does it mean that more data is >>> processed by each reducer or does it mean how many reducers can be active >>> at one time >>> >>> On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto: >> [EMAIL PROTECTED])> wrote: >>> >>>> Mohit, >>>> you can use PARALLEL clause to specify reduce tasks. More info here >>>> >> http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features >>>> >>>> -- >>>> Harsha >>>> >>>> >>>> On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: >>>> >>>>> Is there a way to specify max number of reduce tasks that a job >> should >>>> span >>>>> in pig script without having to restart the cluster? >>>> >>>> >>> >>> >>> >> >> >>
-
Re: Reduce TasksMohit Anchlia 2013-02-02, 01:07
Sorry my question was around mapred.map.tasks I mistakenly specified wrong
parameter. In pig I am setting mapred.map.tasks to 200 but there are more tasks being executed. On Fri, Feb 1, 2013 at 5:04 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > Setting that mapred.reduce.tasks won't work as Pig overrides. See > http://pig.apache.org/docs/r0.10.0/perf.html#parallel for info on how to > set the number of reducers in Pig. > > Alan. > > On Feb 1, 2013, at 4:53 PM, Mohit Anchlia wrote: > > > Just slightly different problem I tried setting SET mapred.reduce.tasks > to > > 200 in pig but still more tasks were launched for that job. Is there any > > other way to set the parameter? > > > > On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[EMAIL PROTECTED]> wrote: > > > >> > >> its the total number of reducers not active reducers. > >> If you specify lower number each reducer gets more data to process. > >> -- > >> Harsha > >> > >> > >> On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote: > >> > >>> Thanks! Is there a downside of reducing number of reducers? I am trying > >> to > >>> alleviate high CPU. > >>> > >>> With low reducers using parallel clause does it mean that more data is > >>> processed by each reducer or does it mean how many reducers can be > active > >>> at one time > >>> > >>> On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto: > >> [EMAIL PROTECTED])> wrote: > >>> > >>>> Mohit, > >>>> you can use PARALLEL clause to specify reduce tasks. More info here > >>>> > >> > http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features > >>>> > >>>> -- > >>>> Harsha > >>>> > >>>> > >>>> On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: > >>>> > >>>>> Is there a way to specify max number of reduce tasks that a job > >> should > >>>> span > >>>>> in pig script without having to restart the cluster? > >>>> > >>>> > >>> > >>> > >>> > >> > >> > >> > >
-
Re: Reduce TasksRohini Palaniswamy 2013-02-06, 21:30
The number of maps depends on the number of input splits. mapred.map.tasks
is just a hint and needs to be honored by the InputFormat. With pig, you can try pig.maxCombinedSplitSize configuration to control the number of maps based on input size. For eg: 1G split size can be specified as Dpig.maxCombinedSplitSize=1073741824 Regards, Rohini On Fri, Feb 1, 2013 at 5:07 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Sorry my question was around mapred.map.tasks I mistakenly specified wrong > parameter. In pig I am setting mapred.map.tasks to 200 but there are more > tasks being executed. > > On Fri, Feb 1, 2013 at 5:04 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > > > Setting that mapred.reduce.tasks won't work as Pig overrides. See > > http://pig.apache.org/docs/r0.10.0/perf.html#parallel for info on how to > > set the number of reducers in Pig. > > > > Alan. > > > > On Feb 1, 2013, at 4:53 PM, Mohit Anchlia wrote: > > > > > Just slightly different problem I tried setting SET mapred.reduce.tasks > > to > > > 200 in pig but still more tasks were launched for that job. Is there > any > > > other way to set the parameter? > > > > > > On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[EMAIL PROTECTED]> wrote: > > > > > >> > > >> its the total number of reducers not active reducers. > > >> If you specify lower number each reducer gets more data to process. > > >> -- > > >> Harsha > > >> > > >> > > >> On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote: > > >> > > >>> Thanks! Is there a downside of reducing number of reducers? I am > trying > > >> to > > >>> alleviate high CPU. > > >>> > > >>> With low reducers using parallel clause does it mean that more data > is > > >>> processed by each reducer or does it mean how many reducers can be > > active > > >>> at one time > > >>> > > >>> On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto: > > >> [EMAIL PROTECTED])> wrote: > > >>> > > >>>> Mohit, > > >>>> you can use PARALLEL clause to specify reduce tasks. More info here > > >>>> > > >> > > > http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features > > >>>> > > >>>> -- > > >>>> Harsha > > >>>> > > >>>> > > >>>> On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote: > > >>>> > > >>>>> Is there a way to specify max number of reduce tasks that a job > > >> should > > >>>> span > > >>>>> in pig script without having to restart the cluster? > > >>>> > > >>>> > > >>> > > >>> > > >>> > > >> > > >> > > >> > > > > > |