Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> CUBE/ROLLUP/GROUPING SETS syntax


+
Prasanth J 2012-05-28, 05:36
+
Jonathan Coveney 2012-05-29, 20:05
+
Prasanth J 2012-05-29, 22:55
+
Alan Gates 2012-05-30, 16:35
+
Jonathan Coveney 2012-05-30, 17:43
+
Alan Gates 2012-05-30, 20:42
+
Prasanth J 2012-05-31, 00:02
+
Jonathan Coveney 2012-05-31, 00:10
+
Prasanth J 2012-06-21, 20:28
+
Alan Gates 2012-06-21, 21:11
+
Prasanth J 2012-06-21, 21:52
+
Dmitriy Ryaboy 2012-06-22, 20:14
+
Jonathan Coveney 2012-06-21, 20:41
Copy link to this message
-
Re: CUBE/ROLLUP/GROUPING SETS syntax
Yeah you are right.  

Thanks
-- Prasanth

On Jun 21, 2012, at 4:41 PM, Jonathan Coveney wrote:

> Just to make sure I understand this correctly, is
>
> out = CUBE rel BY CUBE(a,b,c), ROLLUP(c,d), CUBE(e,f);
>
> equivalent to:
>
> out1 = CUBE rel BY (a,b,c);
> out2 = ROLLUP rel BY (c,d);
> out3 = CUBY rel BY (e,f);
>
> out = CROSS out1, out2, out3;
>
> ?
>
> 2012/6/21 Prasanth J <[EMAIL PROTECTED]>
>
>> Hello all
>>
>> I initially implemented ROLLUP as a separate operation with the following
>> syntax
>>
>> a = ROLLUP inp BY (x,y);
>>
>> which does the same thing as CUBE (inserting foreach + group-by in logical
>> plan) except that it uses RollupDimensions UDF. But the issue with this
>> approach is that we cannot mix CUBE and ROLLUP operations together in the
>> same syntax which is a typical case. SQL/Oracle supports using CUBE and
>> ROLLUP together like
>>
>> GROUP BY CUBE(a,b,c), ROLLUP(c,d), CUBE(e,f);
>>
>> so I modified the pig grammar to support the similar usage. So now we can
>> use a syntax similar to SQL
>>
>> out = CUBE rel BY CUBE(a,b,c), ROLLUP(c,d), CUBE(e,f);
>>
>> In this approach, the logical plan should introduce cartesian product
>> between bags generated by CUBE(a,b,c), ROLLUP(c,d) and CUBE(e,f) for
>> generating the final output. But I read from the documentation (
>> http://pig.apache.org/docs/r0.10.0/basic.html#cross) that CROSS operator
>> is an expensive operator and advices to use it sparingly.
>>
>> Is there any other way to achieve the cartesian product in a less
>> expensive way? Also, does anyone have thoughts about this new syntax?
>>
>> Thanks
>> -- Prasanth
>>
>> On May 30, 2012, at 8:10 PM, Jonathan Coveney wrote:
>>
>>> As far as the underlying implementation, if they all use the same
>>> optimizations that you use in cube, then it can be LOCube. If they have
>>> their own optimizations etc (or could), it may be worth them having their
>>> own Logical operators (which might just be LOCube with flags for the time
>>> being) that allows us more flexibilty. But I suppose that's between you,
>>> eclipse, and your GSOC mentor.
>>>
>>> 2012/5/30 Prasanth J <[EMAIL PROTECTED]>
>>>
>>>> Thanks Alan and Jon for expressing your views.
>>>>
>>>> I agree with Jon's point, if the syntax contains CUBE then user expects
>> it
>>>> to perform CUBE operation. So Jon's syntax seems more meaningful and
>> concise
>>>>
>>>> rel = CUBE rel BY (dims);
>>>> rel = ROLLUP rel BY (dims);
>>>> rel = GROUPING_SET rel BY (dims);
>>>>
>>>> 2 reasons why I do not prefer using SQL syntax is
>>>> 1) I do not want to break into existing Group operator implementation :)
>>>> 2) The syntax gets longer in case of partial hierarchical cubing/rollups
>>>> For ex:
>>>>
>>>> rel = GROUP rel BY dim0, ROLLUP(dim1, dim2, dim3),
>> ROLLUP(dim4,dim5,dim6),
>>>> ROLLUP(dim7,dim8,dim9);
>>>>
>>>> whereas same thing can be expressed like
>>>>
>>>> rel = ROLLUP rel BY dim0,
>>>> (dim1,dim2,dim3),(dim4,dim5,dim6),(dim7,dim8,dim9);
>>>>
>>>> Thanks Alan for pointing out the way for independently managing the
>>>> operators in parser and logical/physical plan. So for all these
>> operators
>>>> (CUBE, ROLLUP, GROUPING_SET) I can just generate LOCube and use flags to
>>>> differentiate between these three operations.
>>>>
>>>> But, yes we are proliferating operators in this case.
>>>>
>>>> Thanks
>>>> -- Prasanth
>>>>
>>>> On May 30, 2012, at 4:42 PM, Alan Gates wrote:
>>>>
>>>>>
>>>>> On May 30, 2012, at 10:43 AM, Jonathan Coveney wrote:
>>>>>
>>>>>> I was going to say the same thing Alan said w.r.t. operators:
>> operators
>>>> in
>>>>>> the grammar can correspond to whatever logical and physical operators
>>>> you
>>>>>> want.
>>>>>>
>>>>>> As far as the principle of least astonishment compared to SQL... Pig
>> is
>>>>>> already pretty astonishing. I don't know why we would bend over
>>>> backwards
>>>>>> to make the syntax so similar in this case when even getting to the
>>>> point
>>>>>
+
Jonathan Coveney 2012-06-21, 20:50