Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Is there a way to set reducer number of pig besides using parallel keyword?


Copy link to this message
-
Is there a way to set reducer number of pig besides using parallel keyword?
Hi,
I try to set a reducer number in the following way:
java -Dmapred.reduce.tasks=8 -cp pig.jar:$HADOOP_HOME/conf
org.apache.pig.Main ./L1.pig

but it doesn't work, the reducers number remain the same the as 40, which is
the parallel number in L1.pig.(L1.pig is from pigmix).
If I delete the parallel 40 in the script, the reduce.tasks will be 2, which
I thought to be 1.

L1.pig:
-- This script tests reading from a map, flattening a bag of maps, and use
of bincond.
register pigperf.jar;
A = load '/user/pig/tests/data/pigmix/page_views' using
org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = foreach A generate user, (int)action as action, (map[])page_info as
page_info,
    flatten((bag{tuple(map[])})page_links) as page_links;
C = foreach B generate user,
    (action == 1 ? page_info#'a' : page_links#'b') as header;
D = group C by user parallel 40;
E = foreach D generate group, COUNT(C) as cnt;
store E into 'L1out';

Best,
Hui