Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to combine muliple group by


Copy link to this message
-
Re: How to combine muliple group by
Pig will auto-combine these for you.  In the script example you give Pig should already be combining both group bys into a single MR job.  You can check this by running explain on it.

Alan.

On May 15, 2012, at 3:11 PM, shan s wrote:

> Thanks Bill.
>
> My objective is to improve performance. So I do want to combine the logic.
> If we were to do this in java, we could do this in single foreach.
>
> Will the macro help in this regard? Or will it  just act as code generator?
>
> On Tue, May 15, 2012 at 8:30 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>
>> You can combine multiple relations using the UNION operator. If you're
>> trying to combine logic, you can use a macro to do e2-e5 below that takes
>> (e1, empid) or (e1, group). See the example here:
>>
>> http://hortonworks.com/blog/new-apache-pig-features-part-1-macro/
>>
>> On Tue, May 15, 2012 at 6:50 AM, shan s <[EMAIL PROTECTED]> wrote:
>>
>>> How can I combine multiple group by that are performed on essentially
>> same
>>> relation?
>>> In the case below, can I do this in single foreach?
>>>
>>> e1 =  load 'emp' using PigStorage() as (empid, school, district, score);
>>>
>>> e2 = group e1 by empid;
>>> e3 = foreach e2 generate group, AVG(e1.score) as s;
>>> e4 = order e3 by s desc;
>>> e5 = limit e4 3;
>>> dump e5;
>>>
>>> e2 = group e1 by school;
>>> e3 = foreach e2 generate group, AVG(e1.score) as s;
>>> e4 = order e3 by s desc;
>>> e5 = limit e4 3;
>>> dump e5;
>>> Thank You,
>>> Prashant.
>>>
>>
>>
>>
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> [EMAIL PROTECTED] going forward.*
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB