Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Nb of reduce tasks when GROUPing


+
Vincent Barat 2013-05-17, 14:48
Copy link to this message
-
Re: Nb of reduce tasks when GROUPing
Take a look at the PARALLEL clause:

http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause

On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I use this request to remove duplicated entries from a set of input files
> (I cannot use DISTINCT since some fields can be different)
>
> grp = GROUP alias BY key;
> alias = FOREACH grp {
>   record = LIMIT  alias 1;
>   GENERATE FLATTEN(record) AS ... :
> }
>
> It appears that this request always generates 1 reducer (I use 0 as
> default nb of reducer to let PIG decide) whatever the size of my input data.
>
> Is it a normal behavior ? How can I improve my request time by using
> several reducers ?
>
> Thanks a lot for your help.
>
>
>
+
Jonathan Coveney 2013-05-19, 22:38
+
Vincent Barat 2013-05-21, 16:27
+
Vincent Barat 2013-05-21, 16:16
+
Vincent Barat 2013-05-21, 13:23
+
Norbert Burger 2013-05-21, 17:23
+
Vincent Barat 2013-05-21, 19:44
+
Vincent Barat 2013-05-22, 13:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB