Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Nb of reduce tasks when GROUPing


Copy link to this message
-
Re: Nb of reduce tasks when GROUPing
Also, look into the TOP udf instead of doing the limit. It can potentially
be a lot faster and is cleaner, IMHO.
2013/5/19 Norbert Burger <[EMAIL PROTECTED]>

> Take a look at the PARALLEL clause:
>
> http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause
>
> On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I use this request to remove duplicated entries from a set of input files
> > (I cannot use DISTINCT since some fields can be different)
> >
> > grp = GROUP alias BY key;
> > alias = FOREACH grp {
> >   record = LIMIT  alias 1;
> >   GENERATE FLATTEN(record) AS ... :
> > }
> >
> > It appears that this request always generates 1 reducer (I use 0 as
> > default nb of reducer to let PIG decide) whatever the size of my input
> data.
> >
> > Is it a normal behavior ? How can I improve my request time by using
> > several reducers ?
> >
> > Thanks a lot for your help.
> >
> >
> >
>