|
|
-
Re: Does the pig optimizer keep track of relations that are already sorted when doing a JOIN?Ashutosh Chauhan 2011-08-21, 16:59
@Andrew,
You can take a look at the conditions for merge-join here: http://pig.apache.org/docs/r0.8.1/piglatin_ref1.html#Merge+Joins @Kevin, If you want to improve merge-join, way to go is https://issues.apache.org/jira/browse/PIG-959 Ashutosh On Sun, Aug 21, 2011 at 04:27, Andrew Clegg <andrew.clegg+[EMAIL PROTECTED]>wrote: > I'd never thought about this before, but some of my scripts could > probably be made much quicker by taking advantage of this. From what > operations are relations guaranteed to be sorted? Distinct, group by, > order by, previous merge join I guess? Any others? > > On 20 August 2011 07:12, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote: > > Hey Kevin, > > > > No, Pig currently doesn't auto-detect that data is getting sorted in > > previous steps of script. So, you need to tell it by 'using merge'. > > > > Hope it helps, > > Ashutosh > > > > On Fri, Aug 19, 2011 at 22:51, Kevin Burton <[EMAIL PROTECTED]> wrote: > > > >> I was reading about USING 'merge' with JOIN when relations are already > >> sorted. > >> > >> I actually was just looking through some code and realized that one of > my > >> JOINs was on two relations that were *already* sorted due to a DISTINCT > and > >> GROUP operation. > >> > >> I just added USING 'merge' and the initial results look the same. > >> > >> I haven't benchmarked it though. > >> > >> Does/would the existing optimizer be able to detect this and just use > merge > >> without manual intervention? > >> > >> -- > >> > >> Founder/CEO Spinn3r.com > >> > >> Location: *San Francisco, CA* > >> Skype: *burtonator* > >> > >> Skype-in: *(415) 871-0687* > >> > > > > > > -- > > http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg > |