Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> CUBE/ROLLUP/GROUPING SETS syntax


+
Prasanth J 2012-05-28, 05:36
+
Jonathan Coveney 2012-05-29, 20:05
Copy link to this message
-
Re: CUBE/ROLLUP/GROUPING SETS syntax
Thanks Jonathan for looking into it and for your suggestions.

The reason why I came with a clause rather than a separate operator was to avoid adding additional operators to the grammar.
So adding ROLLUP, GROUPING_SET will need separate logical operators adding to the complexity. I am planning to keep everything under cube operator, so only LOCube and POCube operators will be added additionally. And as you and Dmitriy have mentioned the purpose of HAVING clause is the same as FILTER so we do not need a separate HAVING clause.

I will give a quick recap of cube related operations and multiple syntax options for achieving the same. I am also adding partial cubing and rollup in this discussion.

1) CUBE

Current syntax:
alias = CUBE rel BY (a, b);

Following group-by's will be computed:
(a, b)
(a)
(b)
()

2) Partial CUBE

Proposed syntax:
alias = CUBE rel BY a, (b, c);

Following group-by's will be computed:
(a, b, c)
(a, b)
(a, c)
(a)

3) ROLLUP

Proposed syntax 1:
alias = CUBE rel BY ROLLUP(a, b);

Proposed syntax 2:
alias = CUBE rel BY (a::b);

Proposed syntax 3:
alias = ROLLUP rel BY (a, b);

Following group-by's will be computed:
(a, b)
(a)
()

4) Partial ROLLUP

Proposed syntax 1:
alias = CUBE rel BY a, ROLLUP(b, c);

Proposed syntax 2:
alias = CUBE rel BY (a, b::c);

Proposed syntax 3:
alias = ROLLUP rel BY a, (b, c);

Following group-by's will be computed:
(a, b, c)
(a, b)
(a)

5) GROUPING SETS

Proposed syntax 1:
alias = CUBE rel BY GROUPING SETS((a), (b, c), (c))

Proposed syntax 2:
alias = CUBE rel BY {(a), (b, c), (c)}

Proposed syntax 3:
alias = GROUPING_SET rel BY ((a), (b, c), (c))

Following group-by's will be computed:
(a)
(b, c)
(c)

Please vote for syntax 1, 2 or 3 so that we can come to a consensus before I start hacking the grammar file.

Thanks
-- Prasanth

On May 29, 2012, at 4:05 PM, Jonathan Coveney wrote:

> Hey Prashanth, happy hacking.
>
> My opinion:
>
> CUBE:
>
> alias = CUBE rel BY (a,b,c);
>
>
> I like that syntax. It's unambiguous what is going on.
>
>
> ROLLUP:
>
>
> alias = CUBE rel BY ROLLUP(a,b,c);
>
>
> I never liked that syntax in SQL. I suggest we just do what we did with CUBE. IE
>
>
> alias = ROLLUP rel BY (a,b,c);
>
>
> GROUPING SETS:
>
>
> alias = CUBE rel BY GROUPING SETS((a,b),(b),());
>
>
> I don't like this. The cube vs. grouping sets is confusing to me. maybe
> following the
> same pattern you could do something like:
>
> alias = GROUPING_SET rel BY ((a,b),(b),());
>
> As far as having, is there an optimization that can be done with a HAVING
> clause that can't be done based on the logical plan that comes afterwards?
> That seems odd to me. Since you have to materialize the result anyway,
> can't the having clause just be a FILTER that comes after the cube? I don't
> know why we need a special syntax.
>
> My opinion. Forgive janky formatting, gmail + paste = pain.
> Jon
>
> 2012/5/27 Prasanth J <[EMAIL PROTECTED]>
>
>> Hello everyone
>>
>> I am looking for feedback from the community about the syntax for
>> CUBE/ROLLUP/GROUPING SETS operations in pig.
>> I am moving the discussion from JIRA to dev-list so that everyone can
>> share their opinion for operator syntax. Please have a look at the syntax
>> proposal at the link below and let me know your opinion
>>
>>
>> https://issues.apache.org/jira/browse/PIG-2167?focusedCommentId=13277644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13277644
>>
>> Thanks
>> -- Prasanth
>>
>>

+
Alan Gates 2012-05-30, 16:35
+
Jonathan Coveney 2012-05-30, 17:43
+
Alan Gates 2012-05-30, 20:42
+
Prasanth J 2012-05-31, 00:02
+
Jonathan Coveney 2012-05-31, 00:10
+
Prasanth J 2012-06-21, 20:28
+
Alan Gates 2012-06-21, 21:11
+
Prasanth J 2012-06-21, 21:52
+
Dmitriy Ryaboy 2012-06-22, 20:14
+
Jonathan Coveney 2012-06-21, 20:41
+
Prasanth J 2012-06-21, 20:43
+
Jonathan Coveney 2012-06-21, 20:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB