Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Nb of reduce tasks when GROUPing


+
Vincent Barat 2013-05-17, 14:48
+
Norbert Burger 2013-05-19, 13:37
+
Jonathan Coveney 2013-05-19, 22:38
+
Vincent Barat 2013-05-21, 16:27
+
Vincent Barat 2013-05-21, 16:16
+
Vincent Barat 2013-05-21, 13:23
Copy link to this message
-
Re: Nb of reduce tasks when GROUPing
As Jonathan mentioned, TOP should obviate this particular use case.  But
for future examples, the parameters
pig.exec.reducers.bytes.per.reducer and pig.exec.reducers.max
might be useful:

https://issues.apache.org/jira/browse/PIG-1249

Norbert

On Tue, May 21, 2013 at 9:23 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Thanks for your reply.
>
> My goal is actually to AVOID using PARALLEL toi let PIG guess a good
> number of reducer by itself.
> Usually it works well for me, so I don't understadn why in that case it
> does not.
>
> Le 19/05/13 15:37, Norbert Burger a écrit :
>
>  Take a look at the PARALLEL clause:
>>
>> http://pig.apache.org/docs/r0.**7.0/cookbook.html#Use+the+**
>> PARALLEL+Clause<http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause>
>>
>> On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[EMAIL PROTECTED]>
>> **wrote:
>>
>>  Hi,
>>>
>>> I use this request to remove duplicated entries from a set of input files
>>> (I cannot use DISTINCT since some fields can be different)
>>>
>>> grp = GROUP alias BY key;
>>> alias = FOREACH grp {
>>>    record = LIMIT  alias 1;
>>>    GENERATE FLATTEN(record) AS ... :
>>> }
>>>
>>> It appears that this request always generates 1 reducer (I use 0 as
>>> default nb of reducer to let PIG decide) whatever the size of my input
>>> data.
>>>
>>> Is it a normal behavior ? How can I improve my request time by using
>>> several reducers ?
>>>
>>> Thanks a lot for your help.
>>>
>>>
>>>
>>>
+
Vincent Barat 2013-05-21, 19:44
+
Vincent Barat 2013-05-22, 13:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB