Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Multiple reduce no effect


Copy link to this message
-
Multiple reduce no effect
hi all

Want to know why,
Why two reduce execution time so long
But another 14 reduce is performed so fast

Code is as follows
SITEG = GROUP ASET by (platform);
RES = FOREACH SITEG{
        UV = DISTINCT ASET.ukey;

    LISTTMP = FILTER ASET BY requesturl == 'list';
        LISTUV = DISTINCT LISTTMP.ukey;

    ITEMTMP = FILTER ASET BY requesturl == 'item';
        ITEMUV = DISTINCT ITEMTMP.ukey;

    TAOKETMP = FILTER ASET BY requesturl == 'jump';
        TAOKEUV = DISTINCT TAOKETMP.ukey;

    COLTMP = FILTER ASET BY requesturl == 'favorite';
        COLUV = DISTINCT COLTMP.ukey;

    GENERATE FLATTEN(group),COUNT(UV),COUNT(LISTTMP),COUNT(LISTUV),
COUNT(ITEMTMP),COUNT(ITEMUV), COUNT(TAOKETMP),
COUNT(TAOKEUV),COUNT(COLTMP),COUNT(COLUV);
};

A total of 16 reduce.
But 14 reduce the counter as follows
*File Output Format Counters*Bytes Written0
*FileSystemCounters*FILE_BYTES_READ22 FILE_BYTES_WRITTEN124,031
*Map-Reduce Framework* Reduce input groups0 Combine output records0Reduce
shuffle bytes 1,710Physical memory (bytes) snapshot305,762,304 Reduce
output records0Spilled Records 0CPU time spent (ms)8,850 Total committed
heap usage (bytes)757,137,408Virtual memory (bytes) snapshot
2,749,321,216Combine
input records0 Reduce input records0

The other two counters
*org.apache.pig.PigCounters*PROACTIVE_SPILL_COUNT_RECS31,154,190
SPILLABLE_MEMORY_MANAGER_SPILL_COUNT 3PROACTIVE_SPILL_COUNT_BAGS1
*File Output Format Counters*Bytes Written 0
*FileSystemCounters*FILE_BYTES_READ181,863,945FILE_BYTES_WRITTEN 181,987,953
HDFS_BYTES_WRITTEN70
*Map-Reduce Framework*Reduce input groups 1Combine output records0Reduce
shuffle bytes225,663,351 Physical memory (bytes) snapshot2,039,889,920Reduce
output records1Spilled Records 32,370,070Total committed heap usage (bytes)
1,903,493,120CPU time spent (ms)925,630 Virtual memory (bytes) snapshot
2,727,219,200Combine input records
0
[EMAIL PROTECTED]
+
Shahab Yunus 2013-09-03, 12:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB