Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: [jira] [Commented] (PIG-2747) Support more predicate pushdown to a data source by pulling up multiple predicates from branches using the same data source


Copy link to this message
-
Re: [jira] [Commented] (PIG-2747) Support more predicate pushdown to a data source by pulling up multiple predicates from branches using the same data source
Hi, Dmitriy,
Can you give the script you are thinking of?

On Sat, Jun 16, 2012 at 8:43 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> I don't think a union is required for this to make sense.
>
> On Jun 11, 2012, at 11:58 AM, "Daniel Dai (JIRA)" <[EMAIL PROTECTED]> wrote:
>
> >
> >    [
> https://issues.apache.org/jira/browse/PIG-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292983#comment-13292983]
> >
> > Daniel Dai commented on PIG-2747:
> > ---------------------------------
> >
> > My understanding is there is a union after T1, T2, right?
> >
> > Yes we only merge the consecutive filter into "and" condition. We don't
> merge "or" condition. So you want
> >
> > filter cond1, filter cond2 -> union ==> filter cond1 or cond2
> >
> >> Support more predicate pushdown to a data source by pulling up multiple
> predicates from branches using the same data source
> >>
> ---------------------------------------------------------------------------------------------------------------------------
> >>
> >>                Key: PIG-2747
> >>                URL: https://issues.apache.org/jira/browse/PIG-2747
> >>            Project: Pig
> >>         Issue Type: Improvement
> >>           Reporter: Yu Xu
> >>           Priority: Minor
> >>
> >> consider the following example:
> >> T = load ... ;
> >> T1 = filter T by col == 'hello';
> >> T2 = filter T by col =='world';
> >> currently Pig optimizer does not combine the two predicates and cannot
> push down the predicates to the data sources (via LoadMetadata).  Thus the
> data source cannot do any filtering. A full table/file scan is required.
> >> A current more efficient workaround (by hand) is to rewrite the above
> script to the following equivalent one:
> >> T = load ...;
> >> T = filter T by col == 'hello' or col == 'world' ;
> >> T1 = filter T by col == 'hello';
> >> T2 = filter T by col == 'world';
> >> the above script enables Pig to push down the predicate (col == 'hello'
> or col == 'world') to the data source to use available partitions/indexes
> for potentially much more efficient processing.
> >> This JIRA is created to request PIG optimizer to perform the above type
> of optimization automatically.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB