Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - How to combine muliple group by


Copy link to this message
-
Re: How to combine muliple group by
Alan Gates 2012-05-16, 00:39
Pig will auto-combine these for you.  In the script example you give Pig should already be combining both group bys into a single MR job.  You can check this by running explain on it.

Alan.

On May 15, 2012, at 3:11 PM, shan s wrote:

> Thanks Bill.
>
> My objective is to improve performance. So I do want to combine the logic.
> If we were to do this in java, we could do this in single foreach.
>
> Will the macro help in this regard? Or will it  just act as code generator?
>
> On Tue, May 15, 2012 at 8:30 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>
>> You can combine multiple relations using the UNION operator. If you're
>> trying to combine logic, you can use a macro to do e2-e5 below that takes
>> (e1, empid) or (e1, group). See the example here:
>>
>> http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/
>>
>> On Tue, May 15, 2012 at 6:50 AM, shan s <[EMAIL PROTECTED]> wrote:
>>
>>> How can I combine multiple group by that are performed on essentially
>> same
>>> relation?
>>> In the case below, can I do this in single foreach?
>>>
>>> e1 =  load 'emp' using PigStorage() as (empid, school, district, score);
>>>
>>> e2 = group e1 by empid;
>>> e3 = foreach e2 generate group, AVG(e1.score) as s;
>>> e4 = order e3 by s desc;
>>> e5 = limit e4 3;
>>> dump e5;
>>>
>>> e2 = group e1 by school;
>>> e3 = foreach e2 generate group, AVG(e1.score) as s;
>>> e4 = order e3 by s desc;
>>> e5 = limit e4 3;
>>> dump e5;
>>> Thank You,
>>> Prashant.
>>>
>>
>>
>>
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> [EMAIL PROTECTED] going forward.*
>>