Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Nb of reduce tasks when GROUPing


Copy link to this message
-
Re: Nb of reduce tasks when GROUPing
Also, look into the TOP udf instead of doing the limit. It can potentially
be a lot faster and is cleaner, IMHO.
2013/5/19 Norbert Burger <[EMAIL PROTECTED]>

> Take a look at the PARALLEL clause:
>
> http://pig.apache.org/docs/r0.7.0/cookbook.html#Use+the+PARALLEL+Clause
>
> On Fri, May 17, 2013 at 10:48 AM, Vincent Barat <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > I use this request to remove duplicated entries from a set of input files
> > (I cannot use DISTINCT since some fields can be different)
> >
> > grp = GROUP alias BY key;
> > alias = FOREACH grp {
> >   record = LIMIT  alias 1;
> >   GENERATE FLATTEN(record) AS ... :
> > }
> >
> > It appears that this request always generates 1 reducer (I use 0 as
> > default nb of reducer to let PIG decide) whatever the size of my input
> data.
> >
> > Is it a normal behavior ? How can I improve my request time by using
> > several reducers ?
> >
> > Thanks a lot for your help.
> >
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB