Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Nb of reduce tasks when GROUPing


Copy link to this message
-
Re: Nb of reduce tasks when GROUPing
Take a look at the PARALLEL clause:

http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause

On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I use this request to remove duplicated entries from a set of input files
> (I cannot use DISTINCT since some fields can be different)
>
> grp = GROUP alias BY key;
> alias = FOREACH grp {
>   record = LIMIT  alias 1;
>   GENERATE FLATTEN(record) AS ... :
> }
>
> It appears that this request always generates 1 reducer (I use 0 as
> default nb of reducer to let PIG decide) whatever the size of my input data.
>
> Is it a normal behavior ? How can I improve my request time by using
> several reducers ?
>
> Thanks a lot for your help.
>
>
>