Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Does the pig optimizer keep track of relations that are already sorted when doing a JOIN?


Copy link to this message
-
Re: Does the pig optimizer keep track of relations that are already sorted when doing a JOIN?
@Andrew,
You can take a look at the conditions for merge-join here:
http://pig.apache.org/docs/r0.8.1/piglatin_ref1.html#Merge+Joins

@Kevin,
If you want to improve merge-join, way to go is
https://issues.apache.org/jira/browse/PIG-959

Ashutosh

On Sun, Aug 21, 2011 at 04:27, Andrew Clegg
<andrew.clegg+[EMAIL PROTECTED]>wrote:

> I'd never thought about this before, but some of my scripts could
> probably be made much quicker by taking advantage of this. From what
> operations are relations guaranteed to be sorted? Distinct, group by,
> order by, previous merge join I guess? Any others?
>
> On 20 August 2011 07:12, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote:
> > Hey Kevin,
> >
> > No, Pig currently doesn't auto-detect that data is getting sorted in
> > previous steps of script. So, you need to tell it by 'using merge'.
> >
> > Hope it helps,
> > Ashutosh
> >
> > On Fri, Aug 19, 2011 at 22:51, Kevin Burton <[EMAIL PROTECTED]> wrote:
> >
> >> I was reading about USING 'merge' with JOIN when relations are already
> >> sorted.
> >>
> >> I actually was just looking through some code and realized that one of
> my
> >> JOINs was on two relations that were *already* sorted due to a DISTINCT
> and
> >> GROUP operation.
> >>
> >> I just added USING 'merge' and the initial results look the same.
> >>
> >> I haven't benchmarked it though.
> >>
> >> Does/would the existing optimizer be able to detect this and just use
> merge
> >> without manual intervention?
> >>
> >> --
> >>
> >> Founder/CEO Spinn3r.com
> >>
> >> Location: *San Francisco, CA*
> >> Skype: *burtonator*
> >>
> >> Skype-in: *(415) 871-0687*
> >>
> >
>
>
>
> --
>
> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB