Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig multiple groupby problem


+
Deepak Tiwari 2012-08-28, 20:35
Copy link to this message
-
Re: Pig multiple groupby problem
Couple of ideas:

1) do you need exact distinct counts? There are approximate distinct counting approaches that may be appropriate an much more efficient.
2) can you try with pig-2888?

On Aug 28, 2012, at 1:35 PM, Deepak Tiwari <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am processing huge dataset and need to aggregate data using on multiple
> levels ( columns ).
>
> for example A,B,C,D,E,F, CalculateDistinctinctOnValue1,
> CalculateDistinctinctOnValue2, Sum(value3)
>
> I have tried two approaches in one I am reading the file one time and
> generating groupby on each level
>
> for example group by (A,B), group by (A,B,C)
>
> Since I have to do distinct inside foreach which is taking too much time,
> mostly because of skew. ( I have enabled multiquery)
>
> In another approach I have tried creating 8 separate scripts to process
> each group by too, but that is taking more or less the same time and not a
> very efficient one. Could someone please suggest any other way..
>
> Thanks in advance.
>
>
> Deepak
+
Deepak Tiwari 2012-08-29, 20:05
+
Deepak Tiwari 2012-09-28, 21:40
+
Dmitriy Ryaboy 2012-09-28, 22:12
+
Deepak Tiwari 2012-09-28, 22:27
+
Dmitriy Ryaboy 2012-09-28, 22:58
+
Deepak Tiwari 2012-09-28, 23:15