Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - optimization for data cube


+
Haitao Yao 2013-04-03, 03:34
+
Prasanth J 2013-04-03, 05:19
Copy link to this message
-
Re: optimization for data cube
Haitao Yao 2013-04-03, 06:07
Thank you very much.
We're using Pig-0.9.2. I updated to 0.11 but it took an unacceptable time to compile my big pig script. With Pig-0.9.2, it's OK. I still did not find the reason.

So, I think I need migrate the cube operation to 0.9.2 by myself.
Haitao Yao
[EMAIL PROTECTED]
weibo: @haitao_yao
Skype:  haitao.yao.final

在 2013-4-3,下午1:19,Prasanth J <[EMAIL PROTECTED]> 写道:

> From 0.11 release onwards Pig natively supports CUBE operator.
>
> Here is the documentation for CUBE operator http://pig.apache.org/docs/r0.11.1/basic.html#cube
>
> For your case the query can be represented as
>
> cubed = CUBE input BY CUBE(group_a,group_b,group_c);
> output = FOREACH cubed GENERATE FLATTEN(group) as (group_a,group_b,group_c), FLATTEN(cube.value) as value;
>
> More examples can be found in documentation.
>
> Thanks
> -- Prasanth
>
> On Apr 2, 2013, at 11:34 PM, Haitao Yao <[EMAIL PROTECTED]> wrote:
>
>> Hi, all
>> I have a tuple like this:
>> (group_a,group_b,group_c,value)
>>
>> and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
>>
>> (all,all,all,value)
>> (group_a,all,all,value)
>> (all,group_b,all,value)
>> (group_a,group_b,all,value)
>> (all,all,group_c,value)
>> (group_a,all,group_c,value)
>> (all,group_b,group_c,value)
>>
>> and then group by ($0, $1, $2) .
>> How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
>>
>>
>> thanks.
>>
>>
>> Haitao Yao
>> [EMAIL PROTECTED]
>> weibo: @haitao_yao
>> Skype:  haitao.yao.final
>>
>