Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Nb of reduce tasks when GROUPing


Copy link to this message
-
Re: Nb of reduce tasks when GROUPing
As Jonathan mentioned, TOP should obviate this particular use case.  But
for future examples, the parameters
pig.exec.reducers.bytes.per.reducer and pig.exec.reducers.max
might be useful:

https://issues.apache.org/jira/browse/PIG-1249

Norbert

On Tue, May 21, 2013 at 9:23 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Thanks for your reply.
>
> My goal is actually to AVOID using PARALLEL toi let PIG guess a good
> number of reducer by itself.
> Usually it works well for me, so I don't understadn why in that case it
> does not.
>
> Le 19/05/13 15:37, Norbert Burger a écrit :
>
>  Take a look at the PARALLEL clause:
>>
>> http://pig.apache.org/docs/r0.**7.0/cookbook.html#Use+the+**
>> PARALLEL+Clause<http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause>
>>
>> On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[EMAIL PROTECTED]>
>> **wrote:
>>
>>  Hi,
>>>
>>> I use this request to remove duplicated entries from a set of input files
>>> (I cannot use DISTINCT since some fields can be different)
>>>
>>> grp = GROUP alias BY key;
>>> alias = FOREACH grp {
>>>    record = LIMIT  alias 1;
>>>    GENERATE FLATTEN(record) AS ... :
>>> }
>>>
>>> It appears that this request always generates 1 reducer (I use 0 as
>>> default nb of reducer to let PIG decide) whatever the size of my input
>>> data.
>>>
>>> Is it a normal behavior ? How can I improve my request time by using
>>> several reducers ?
>>>
>>> Thanks a lot for your help.
>>>
>>>
>>>
>>>