Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Losing ordering after using ORDER BY


Copy link to this message
-
Re: Losing ordering after using ORDER BY
James Newhaven 2012-05-30, 21:10
Thanks Jonathan. That worked fine.

James

On 29 May 2012, at 08:43 PM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> If you do a grouping, the ordering changes. What you want to do is:
>
> D = FOREACH C GENERATE COUNT($1) as countd;
> D1 = GROUP D ALL;
> D2 = FOREACH D1 {
>  ord = ORDER $1 BY $0 desc;
>  GENERATE MyCustomEvalFunc(ord);
> }
>
> Keep in mind that you'llbe ordering all of your data on one reducer, but
> this isn't very different from what you're doing, where you were passing
> all of your data to one reducer anyway (which is what group all generally
> does). If you have memory issues, this is why.
>
> 2012/5/29 James Newhaven <[EMAIL PROTECTED]>
>
>> Hi,
>>
>> I've noticed that I seem to be losing the ordering of my relation after
>> passing the result of an ORDER BY to an EVAL function.
>>
>> For example:
>>
>> D = FOREACH C GENERATE COUNT($1) as countd;
>> E = ORDER D BY $0 DESC;
>> D1 = GROUP E ALL;
>> D2 = FOREACH D1 GENERATE MyCustomEvalFunc($1);
>>
>> When inspecting the results in MyCustomEvalFunc I noticed the ordering of
>> my results isn't the same as relation E (which uses ORDER BY DESC).
>>
>> Any help appreciated!
>>
>> Thanks,
>> James
>>