Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> optimization for data cube


Copy link to this message
-
Re: optimization for data cube
From 0.11 release onwards Pig natively supports CUBE operator.

Here is the documentation for CUBE operator http://pig.apache.org/docs/r0.11.1/basic.html#cube

For your case the query can be represented as

cubed = CUBE input BY CUBE(group_a,group_b,group_c);
output = FOREACH cubed GENERATE FLATTEN(group) as (group_a,group_b,group_c), FLATTEN(cube.value) as value;

More examples can be found in documentation.

Thanks
-- Prasanth

On Apr 2, 2013, at 11:34 PM, Haitao Yao <[EMAIL PROTECTED]> wrote:

> Hi, all
> I have a tuple like this:
> (group_a,group_b,group_c,value)
>
> and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
>
> (all,all,all,value)
> (group_a,all,all,value)
> (all,group_b,all,value)
> (group_a,group_b,all,value)
> (all,all,group_c,value)
> (group_a,all,group_c,value)
> (all,group_b,group_c,value)
>
> and then group by ($0, $1, $2) .
> How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
>
>
> thanks.
>
>
> Haitao Yao
> [EMAIL PROTECTED]
> weibo: @haitao_yao
> Skype:  haitao.yao.final
>