Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig multiple groupby problem


Copy link to this message
-
Pig multiple groupby problem
Hi,

I am processing huge dataset and need to aggregate data using on multiple
levels ( columns ).

for example A,B,C,D,E,F, CalculateDistinctinctOnValue1,
CalculateDistinctinctOnValue2, Sum(value3)

I have tried two approaches in one I am reading the file one time and
generating groupby on each level

for example group by (A,B), group by (A,B,C)

Since I have to do distinct inside foreach which is taking too much time,
mostly because of skew. ( I have enabled multiquery)

In another approach I have tried creating 8 separate scripts to process
each group by too, but that is taking more or less the same time and not a
very efficient one. Could someone please suggest any other way..

Thanks in advance.
Deepak
+
Dmitriy Ryaboy 2012-08-29, 06:45
+
Deepak Tiwari 2012-08-29, 20:05
+
Deepak Tiwari 2012-09-28, 21:40
+
Dmitriy Ryaboy 2012-09-28, 22:12
+
Deepak Tiwari 2012-09-28, 22:27
+
Dmitriy Ryaboy 2012-09-28, 22:58
+
Deepak Tiwari 2012-09-28, 23:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB