Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Reduce Tasks


The number of maps depends on the number of input splits. mapred.map.tasks
is just a hint and needs to be honored by the InputFormat.  With pig, you
can try pig.maxCombinedSplitSize configuration to control the number of
maps based on input size. For eg: 1G split size can be specified
as Dpig.maxCombinedSplitSize=1073741824

Regards,
Rohini
On Fri, Feb 1, 2013 at 5:07 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Sorry my question was around mapred.map.tasks I mistakenly specified wrong
> parameter. In pig I am setting mapred.map.tasks to 200 but there are more
> tasks being executed.
>
> On Fri, Feb 1, 2013 at 5:04 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
>
> > Setting that mapred.reduce.tasks won't work as Pig overrides.  See
> > http://pig.apache.org/docs/r0.10.0/perf.html#parallel for info on how to
> > set the number of reducers in Pig.
> >
> > Alan.
> >
> > On Feb 1, 2013, at 4:53 PM, Mohit Anchlia wrote:
> >
> > > Just slightly different problem I tried setting SET mapred.reduce.tasks
> > to
> > > 200 in pig but still more tasks were launched for that job. Is there
> any
> > > other way to set the parameter?
> > >
> > > On Fri, Feb 1, 2013 at 3:15 PM, Harsha <[EMAIL PROTECTED]> wrote:
> > >
> > >>
> > >> its the total number of reducers not active reducers.
> > >> If you specify lower number  each reducer gets more data to process.
> > >> --
> > >> Harsha
> > >>
> > >>
> > >> On Friday, February 1, 2013 at 2:54 PM, Mohit Anchlia wrote:
> > >>
> > >>> Thanks! Is there a downside of reducing number of reducers? I am
> trying
> > >> to
> > >>> alleviate high CPU.
> > >>>
> > >>> With low reducers using parallel clause does it mean that more data
> is
> > >>> processed by each reducer or does it mean how many reducers can be
> > active
> > >>> at one time
> > >>>
> > >>> On Fri, Feb 1, 2013 at 2:44 PM, Harsha <[EMAIL PROTECTED] (mailto:
> > >> [EMAIL PROTECTED])> wrote:
> > >>>
> > >>>> Mohit,
> > >>>> you can use PARALLEL clause to specify reduce tasks. More info here
> > >>>>
> > >>
> >
> http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features
> > >>>>
> > >>>> --
> > >>>> Harsha
> > >>>>
> > >>>>
> > >>>> On Friday, February 1, 2013 at 2:42 PM, Mohit Anchlia wrote:
> > >>>>
> > >>>>> Is there a way to specify max number of reduce tasks that a job
> > >> should
> > >>>> span
> > >>>>> in pig script without having to restart the cluster?
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >>
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB