Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig multiple groupby problem

Copy link to this message
Pig multiple groupby problem

I am processing huge dataset and need to aggregate data using on multiple
levels ( columns ).

for example A,B,C,D,E,F, CalculateDistinctinctOnValue1,
CalculateDistinctinctOnValue2, Sum(value3)

I have tried two approaches in one I am reading the file one time and
generating groupby on each level

for example group by (A,B), group by (A,B,C)

Since I have to do distinct inside foreach which is taking too much time,
mostly because of skew. ( I have enabled multiquery)

In another approach I have tried creating 8 separate scripts to process
each group by too, but that is taking more or less the same time and not a
very efficient one. Could someone please suggest any other way..

Thanks in advance.
Dmitriy Ryaboy 2012-08-29, 06:45
Deepak Tiwari 2012-08-29, 20:05
Deepak Tiwari 2012-09-28, 21:40
Dmitriy Ryaboy 2012-09-28, 22:12
Deepak Tiwari 2012-09-28, 22:27
Dmitriy Ryaboy 2012-09-28, 22:58
Deepak Tiwari 2012-09-28, 23:15