Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Multiple reduce no effect


+
centerqi hu 2013-09-03, 09:39
Copy link to this message
-
Re: Multiple reduce no effect
How is your key distribution in your data? There might be a chance that the
2 reducers are getting bulk of your data because of skewed key/data
distribution.

>From the counters themselves, you can see that the 2 reducers' have much
higher values than the set of 14.

Regards,
Shahab
On Tue, Sep 3, 2013 at 5:39 AM, centerqi hu <[EMAIL PROTECTED]> wrote:

> hi all
>
> Want to know why,
> Why two reduce execution time so long
> But another 14 reduce is performed so fast
>
> Code is as follows
>
>
> SITEG = GROUP ASET by (platform);
> RES = FOREACH SITEG{
>         UV = DISTINCT ASET.ukey;
>
>     LISTTMP = FILTER ASET BY requesturl == 'list';
>         LISTUV = DISTINCT LISTTMP.ukey;
>
>     ITEMTMP = FILTER ASET BY requesturl == 'item';
>         ITEMUV = DISTINCT ITEMTMP.ukey;
>
>     TAOKETMP = FILTER ASET BY requesturl == 'jump';
>         TAOKEUV = DISTINCT TAOKETMP.ukey;
>
>     COLTMP = FILTER ASET BY requesturl == 'favorite';
>         COLUV = DISTINCT COLTMP.ukey;
>
>     GENERATE FLATTEN(group),COUNT(UV),COUNT(LISTTMP),COUNT(LISTUV),
> COUNT(ITEMTMP),COUNT(ITEMUV), COUNT(TAOKETMP),
> COUNT(TAOKEUV),COUNT(COLTMP),COUNT(COLUV);
> };
>
> A total of 16 reduce.
> But 14 reduce the counter as follows
>
>
> *File Output Format Counters*Bytes Written0
> *FileSystemCounters*FILE_BYTES_READ22 FILE_BYTES_WRITTEN124,031
> *Map-Reduce Framework* Reduce input groups0 Combine output records0Reduce
> shuffle bytes 1,710Physical memory (bytes) snapshot305,762,304 Reduce
> output records0Spilled Records 0CPU time spent (ms)8,850 Total committed
> heap usage (bytes)757,137,408Virtual memory (bytes) snapshot
> 2,749,321,216Combine
> input records0 Reduce input records0
>
> The other two counters
>
>
>
>
> *org.apache.pig.PigCounters*PROACTIVE_SPILL_COUNT_RECS31,154,190
> SPILLABLE_MEMORY_MANAGER_SPILL_COUNT 3PROACTIVE_SPILL_COUNT_BAGS1
> *File Output Format Counters*Bytes Written 0
> *FileSystemCounters*FILE_BYTES_READ181,863,945FILE_BYTES_WRITTEN
> 181,987,953
> HDFS_BYTES_WRITTEN70
> *Map-Reduce Framework*Reduce input groups 1Combine output records0Reduce
> shuffle bytes225,663,351 Physical memory (bytes)
> snapshot2,039,889,920Reduce
> output records1Spilled Records 32,370,070Total committed heap usage (bytes)
> 1,903,493,120CPU time spent (ms)925,630 Virtual memory (bytes) snapshot
> 2,727,219,200Combine input records
> 0
> [EMAIL PROTECTED]
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB