Kevin Burton 2011-08-20, 05:51
Ashutosh Chauhan 2011-08-20, 06:12
Kevin Burton 2011-08-20, 07:09
Andrew Clegg 2011-08-21, 11:27
-Re: Does the pig optimizer keep track of relations that are already sorted when doing a JOIN?
Ashutosh Chauhan 2011-08-21, 16:59
You can take a look at the conditions for merge-join here:
If you want to improve merge-join, way to go is
On Sun, Aug 21, 2011 at 04:27, Andrew Clegg
> I'd never thought about this before, but some of my scripts could
> probably be made much quicker by taking advantage of this. From what
> operations are relations guaranteed to be sorted? Distinct, group by,
> order by, previous merge join I guess? Any others?
> On 20 August 2011 07:12, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote:
> > Hey Kevin,
> > No, Pig currently doesn't auto-detect that data is getting sorted in
> > previous steps of script. So, you need to tell it by 'using merge'.
> > Hope it helps,
> > Ashutosh
> > On Fri, Aug 19, 2011 at 22:51, Kevin Burton <[EMAIL PROTECTED]> wrote:
> >> I was reading about USING 'merge' with JOIN when relations are already
> >> sorted.
> >> I actually was just looking through some code and realized that one of
> >> JOINs was on two relations that were *already* sorted due to a DISTINCT
> >> GROUP operation.
> >> I just added USING 'merge' and the initial results look the same.
> >> I haven't benchmarked it though.
> >> Does/would the existing optimizer be able to detect this and just use
> >> without manual intervention?
> >> --
> >> Founder/CEO Spinn3r.com
> >> Location: *San Francisco, CA*
> >> Skype: *burtonator*
> >> Skype-in: *(415) 871-0687*
> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg