Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> optimization for data cube


Copy link to this message
-
Re: optimization for data cube
From 0.11 release onwards Pig natively supports CUBE operator.

Here is the documentation for CUBE operator http://pig.apache.org/docs/r0.11.1/basic.html#cube

For your case the query can be represented as

cubed = CUBE input BY CUBE(group_a,group_b,group_c);
output = FOREACH cubed GENERATE FLATTEN(group) as (group_a,group_b,group_c), FLATTEN(cube.value) as value;

More examples can be found in documentation.

Thanks
-- Prasanth

On Apr 2, 2013, at 11:34 PM, Haitao Yao <[EMAIL PROTECTED]> wrote:

> Hi, all
> I have a tuple like this:
> (group_a,group_b,group_c,value)
>
> and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
>
> (all,all,all,value)
> (group_a,all,all,value)
> (all,group_b,all,value)
> (group_a,group_b,all,value)
> (all,all,group_c,value)
> (group_a,all,group_c,value)
> (all,group_b,group_c,value)
>
> and then group by ($0, $1, $2) .
> How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
>
>
> thanks.
>
>
> Haitao Yao
> [EMAIL PROTECTED]
> weibo: @haitao_yao
> Skype:  haitao.yao.final
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB