Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Re: HCatalog scans all partition even after mentioning date filter


+
Thejas Nair 2012-04-25, 19:52
+
Dmitriy Ryaboy 2012-04-25, 20:45
Copy link to this message
-
Re: HCatalog scans all partition even after mentioning date filter
Hi Dmitriy and Thejas,

Should I open a jira for the same?

Thanks,
Aniket
On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Yeah I think we just need to get projection pushdown to work through
> Split operators.
>
> D
>
> On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair <[EMAIL PROTECTED]>
> wrote:
> > cc'ing dev@pig as this is a pig issue.
> >
> > Aniket, What you saw is not related to PIG-2339 .
> >
> > In your example query, the logical plan will look like this -
> >
> > Load (A)
> > |
> > Split
> >  |
> > ---------------------------
> > |             |
> > Filter(B1)   Filter(B2) ...
> >
> > Because of the split operator introduced between the filter conditions
> and
> > load, the filter does not get pushed into the load function.
> >
> > A simple way to fix this in pig would be to not share the load across the
> > filter operators. Another option is to push the condition (B1 or B2 or
> B3)
> > into Load operator and retain rest of the current plan (split and filters
> > following the split).
> >
> > You can ofcourse achieve the same effect by having a separate load
> > statememnt as input for each of the filters.
> >
> > I agree that we should make it possible to ask pig to throw a
> warning/error
> > if the query is going to result in a full table scan on a partitioned
> table.
> >
> > Thanks,
> > Thejas
> >
> >
> >
> >
> > On 4/24/12 7:56 PM, Aniket Mokashi wrote:
> >>
> >> Sorry Thejas, I didnt look into the jira properly earlier.
> >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I did not
> >> hit that issue earlier (and I patched datanucleus). filter-union was a
> >> workaround I was using to avoid some of the thrift timeout problems
> >> earlier. Thrift api's timeout on client side in 20sec by default (I
> >> found the config to change this later) and I hence used a = load
> >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, b2..;
> >> to expect to push these filters separately to the loader. But, that
> >> doesn't work in pig. (I can open a jira, but I havent done enough
> >> investigation at the code level). Thoughts?
> >>
> >> Thanks,
> >> Aniket
> >>
> >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair <[EMAIL PROTECTED]
> >> <mailto:[EMAIL PROTECTED]>> wrote:
> >>
> >>    The issue was not specific to filter-union
> >>    - https://issues.apache.org/__jira/browse/PIG-2339
> >>    <https://issues.apache.org/jira/browse/PIG-2339>.
> >>    The fix was to do filter PushUpFilter before
> PartitionFilterOptimizer .
> >>
> >>    As this is not a hcat issue, it should not matter if you have an
> >>    older hcat version .  fyi, this bug was not there in pig 0.8.x .
> >>    Was it pig 0.9.0 or 0.9.1 that you used ?
> >>
> >>    Thanks,
> >>    Thejas
> >>
> >>
> >>
> >>    On 4/24/12 5:21 PM, Aniket Mokashi wrote:
> >>
> >>        Hi Thejas,
> >>
> >>        Can you point me to jira that fixes filter-union problem (in
> pig)?
> >> I
> >>        haven't tried hcat-0.4 yet, good to know about that issue. I
> >>        will keep a
> >>        watcher.
> >>
> >>        Thanks,
> >>        Aniket
> >>
> >>        On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair
> >>        <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
> >>        <mailto:[EMAIL PROTECTED]
> >>        <mailto:[EMAIL PROTECTED]>__>> wrote:
> >>
> >>            Hi Aniket,
> >>            Are you using pig 0.9 or 0.9.1 ?
> >>            If yes, can you try with pig 0.9.2 ?
> >>            Wondering if you are also hitting the issue that Thomas
> >>        mentioned .
> >>
> >>            Thanks,
> >>            Thejas
> >>
> >>
> >>
> >>
> >>            On 4/23/12 7:39 PM, Aniket Mokashi wrote:
> >>
> >>                Something similar I have noticed is -
> >>
> >>                A = load ...
> >>                B1 = filter A by cond1;
> >>                B2 = filter A by cond2;
> >>                B3 = filter A by cond3;
> >>
> >>                B = union B1, B2, B3; does not push projection.

"...:::Aniket:::... Quetzalco@tl"
+
Thejas Nair 2012-04-25, 21:04
+
Aniket Mokashi 2012-04-25, 21:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB