Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Does the pig optimizer keep track of relations that are already sorted when doing a JOIN?


Copy link to this message
-
Re: Does the pig optimizer keep track of relations that are already sorted when doing a JOIN?
@Andrew,
You can take a look at the conditions for merge-join here:
http://pig.apache.org/docs/r0.8.1/piglatin_ref1.html#Merge+Joins

@Kevin,
If you want to improve merge-join, way to go is
https://issues.apache.org/jira/browse/PIG-959

Ashutosh

On Sun, Aug 21, 2011 at 04:27, Andrew Clegg
<andrew.clegg+[EMAIL PROTECTED]>wrote:

> I'd never thought about this before, but some of my scripts could
> probably be made much quicker by taking advantage of this. From what
> operations are relations guaranteed to be sorted? Distinct, group by,
> order by, previous merge join I guess? Any others?
>
> On 20 August 2011 07:12, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote:
> > Hey Kevin,
> >
> > No, Pig currently doesn't auto-detect that data is getting sorted in
> > previous steps of script. So, you need to tell it by 'using merge'.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Fri, Aug 19, 2011 at 22:51, Kevin Burton <[EMAIL PROTECTED]> wrote:
> >
> >> I was reading about USING 'merge' with JOIN when relations are already
> >> sorted.
> >>
> >> I actually was just looking through some code and realized that one of
> my
> >> JOINs was on two relations that were *already* sorted due to a DISTINCT
> and
> >> GROUP operation.
> >>
> >> I just added USING 'merge' and the initial results look the same.
> >>
> >> I haven't benchmarked it though.
> >>
> >> Does/would the existing optimizer be able to detect this and just use
> merge
> >> without manual intervention?
> >>
> >> --
> >>
> >> Founder/CEO Spinn3r.com
> >>
> >> Location: *San Francisco, CA*
> >> Skype: *burtonator*
> >>
> >> Skype-in: *(415) 871-0687*
> >>
> >
>
>
>
> --
>
> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>