Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Reduce Tasks


+
Mohit Anchlia 2013-02-01, 22:42
+
Harsha 2013-02-01, 22:44
+
Mohit Anchlia 2013-02-01, 22:54
+
Harsha 2013-02-01, 23:15
+
Mohit Anchlia 2013-02-02, 00:53
+
Alan Gates 2013-02-02, 01:04
+
Mohit Anchlia 2013-02-02, 01:07
The number of maps depends on the number of input splits. mapred.map.tasks
is just a hint and needs to be honored by the InputFormat.  With pig, you
can try pig.maxCombinedSplitSize configuration to control the number of
maps based on input size. For eg: 1G split size can be specified
as Dpig.maxCombinedSplitSize=1073741824

Regards,
Rohini
On Fri, Feb 1, 2013 at 5:07 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Sorry my question was around mapred.map.tasks I mistakenly specified wrong
> parameter. In pig I am setting mapred.map.tasks to 200 but there are more
> tasks being executed.
>
> On Fri, Feb 1, 2013 at 5:04 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
> > Setting that mapred.reduce.tasks won't work as Pig overrides.  See
> > http://pig.apache.org/docs/r0.10.0/perf.html#parallel for info on how to
> > set the number of reducers in Pig.
> >
> > Alan.
> >
> > On Feb 1, 2013, at 4:53 PM, Mohit Anchlia wrote:
> >
> > > Just slightly different problem I tried setting SET mapred.reduce.tasks
> > to
> > > 200 in pig but still more tasks were launched for that job. Is there
> any
> > > other way to set the parameter?
> > >
> > > On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[EMAIL PROTECTED]> wrote:
> > >
> > >>
> > >> its the total number of reducers not active reducers.
> > >> If you specify lower number  each reducer gets more data to process.
> > >> --
> > >> Harsha
> > >>
> > >>
> > >> On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote:
> > >>
> > >>> Thanks! Is there a downside of reducing number of reducers? I am
> trying
> > >> to
> > >>> alleviate high CPU.
> > >>>
> > >>> With low reducers using parallel clause does it mean that more data
> is
> > >>> processed by each reducer or does it mean how many reducers can be
> > active
> > >>> at one time
> > >>>
> > >>> On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto:
> > >> [EMAIL PROTECTED])> wrote:
> > >>>
> > >>>> Mohit,
> > >>>> you can use PARALLEL clause to specify reduce tasks. More info here
> > >>>>
> > >>
> >
> http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features
> > >>>>
> > >>>> --
> > >>>> Harsha
> > >>>>
> > >>>>
> > >>>> On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote:
> > >>>>
> > >>>>> Is there a way to specify max number of reduce tasks that a job
> > >> should
> > >>>> span
> > >>>>> in pig script without having to restart the cluster?
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >>
> >
> >
>