Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Is there a way to set reducer number of pig besides using parallel keyword?


Copy link to this message
-
Is there a way to set reducer number of pig besides using parallel keyword?
Hi,
I try to set a reducer number in the following way:
java -Dmapred.reduce.tasks=8 -cp pig.jar:$HADOOP_HOME/conf
org.apache.pig.Main ./L1.pig

but it doesn't work, the reducers number remain the same the as 40, which is
the parallel number in L1.pig.(L1.pig is from pigmix).
If I delete the parallel 40 in the script, the reduce.tasks will be 2, which
I thought to be 1.

L1.pig:
-- This script tests reading from a map, flattening a bag of maps, and use
of bincond.
register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/page_views' using
org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)action as action, (map[])page_info as
page_info,
    flatten((bag{tuple(map[])})page_links) as page_links;
C = foreach B generate user,
    (action == 1 ? page_info#'a' : page_links#'b') as header;
D = group C by user parallel 40;
E = foreach D generate group, COUNT(C) as cnt;
store E into 'L1out';

Best,
Hui
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB