Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Multiple reduce no effect


+
centerqi hu 2013-09-03, 09:39
Copy link to this message
-
Re: Multiple reduce no effect
Shahab Yunus 2013-09-03, 12:54
How is your key distribution in your data? There might be a chance that the
2 reducers are getting bulk of your data because of skewed key/data
distribution.

>From the counters themselves, you can see that the 2 reducers' have much
higher values than the set of 14.

Regards,
Shahab
On Tue, Sep 3, 2013 at 5:39 AM, centerqi hu <[EMAIL PROTECTED]> wrote:

> hi all
>
> Want to know why,
> Why two reduce execution time so long
> But another 14 reduce is performed so fast
>
> Code is as follows
>
>
> SITEG = GROUP ASET by (platform);
> RES = FOREACH SITEG{
>         UV = DISTINCT ASET.ukey;
>
>     LISTTMP = FILTER ASET BY requesturl == 'list';
>         LISTUV = DISTINCT LISTTMP.ukey;
>
>     ITEMTMP = FILTER ASET BY requesturl == 'item';
>         ITEMUV = DISTINCT ITEMTMP.ukey;
>
>     TAOKETMP = FILTER ASET BY requesturl == 'jump';
>         TAOKEUV = DISTINCT TAOKETMP.ukey;
>
>     COLTMP = FILTER ASET BY requesturl == 'favorite';
>         COLUV = DISTINCT COLTMP.ukey;
>
>     GENERATE FLATTEN(group),COUNT(UV),COUNT(LISTTMP),COUNT(LISTUV),
> COUNT(ITEMTMP),COUNT(ITEMUV), COUNT(TAOKETMP),
> COUNT(TAOKEUV),COUNT(COLTMP),COUNT(COLUV);
> };
>
> A total of 16 reduce.
> But 14 reduce the counter as follows
>
>
> *File Output Format Counters*Bytes Written0
> *FileSystemCounters*FILE_BYTES_READ22 FILE_BYTES_WRITTEN124,031
> *Map-Reduce Framework* Reduce input groups0 Combine output records0Reduce
> shuffle bytes 1,710Physical memory (bytes) snapshot305,762,304 Reduce
> output records0Spilled Records 0CPU time spent (ms)8,850 Total committed
> heap usage (bytes)757,137,408Virtual memory (bytes) snapshot
> 2,749,321,216Combine
> input records0 Reduce input records0
>
> The other two counters
>
>
>
>
> *org.apache.pig.PigCounters*PROACTIVE_SPILL_COUNT_RECS31,154,190
> SPILLABLE_MEMORY_MANAGER_SPILL_COUNT 3PROACTIVE_SPILL_COUNT_BAGS1
> *File Output Format Counters*Bytes Written 0
> *FileSystemCounters*FILE_BYTES_READ181,863,945FILE_BYTES_WRITTEN
> 181,987,953
> HDFS_BYTES_WRITTEN70
> *Map-Reduce Framework*Reduce input groups 1Combine output records0Reduce
> shuffle bytes225,663,351 Physical memory (bytes)
> snapshot2,039,889,920Reduce
> output records1Spilled Records 32,370,070Total committed heap usage (bytes)
> 1,903,493,120CPU time spent (ms)925,630 Virtual memory (bytes) snapshot
> 2,727,219,200Combine input records
> 0
> [EMAIL PROTECTED]
>