Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Re: HCatalog scans all partition even after mentioning date filter


+
Thejas Nair 2012-04-25, 19:52
+
Dmitriy Ryaboy 2012-04-25, 20:45
Copy link to this message
-
Re: HCatalog scans all partition even after mentioning date filter
Aniket Mokashi 2012-04-25, 20:47
Hi Dmitriy and Thejas,

Should I open a jira for the same?

Thanks,
Aniket
On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Yeah I think we just need to get projection pushdown to work through
> Split operators.
>
> D
>
> On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair <[EMAIL PROTECTED]>
> wrote:
> > cc'ing dev@pig as this is a pig issue.
> >
> > Aniket, What you saw is not related to PIG-2339 .
> >
> > In your example query, the logical plan will look like this -
> >
> > Load (A)
> > |
> > Split
> >  |
> > ---------------------------
> > |             |
> > Filter(B1)   Filter(B2) ...
> >
> > Because of the split operator introduced between the filter conditions
> and
> > load, the filter does not get pushed into the load function.
> >
> > A simple way to fix this in pig would be to not share the load across the
> > filter operators. Another option is to push the condition (B1 or B2 or
> B3)
> > into Load operator and retain rest of the current plan (split and filters
> > following the split).
> >
> > You can ofcourse achieve the same effect by having a separate load
> > statememnt as input for each of the filters.
> >
> > I agree that we should make it possible to ask pig to throw a
> warning/error
> > if the query is going to result in a full table scan on a partitioned
> table.
> >
> > Thanks,
> > Thejas
> >
> >
> >
> >
> > On 4/24/12 7:56 PM, Aniket Mokashi wrote:
> >>
> >> Sorry Thejas, I didnt look into the jira properly earlier.
> >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I did not
> >> hit that issue earlier (and I patched datanucleus). filter-union was a
> >> workaround I was using to avoid some of the thrift timeout problems
> >> earlier. Thrift api's timeout on client side in 20sec by default (I
> >> found the config to change this later) and I hence used a = load
> >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, b2..;
> >> to expect to push these filters separately to the loader. But, that
> >> doesn't work in pig. (I can open a jira, but I havent done enough
> >> investigation at the code level). Thoughts?
> >>
> >> Thanks,
> >> Aniket
> >>
> >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair <[EMAIL PROTECTED]
> >> <mailto:[EMAIL PROTECTED]>> wrote:
> >>
> >>    The issue was not specific to filter-union
> >>    - https://issues.apache.org/__jira/browse/PIG-2339
> >>    <https://issues.apache.org/jira/browse/PIG-2339>.
> >>    The fix was to do filter PushUpFilter before
> PartitionFilterOptimizer .
> >>
> >>    As this is not a hcat issue, it should not matter if you have an
> >>    older hcat version .  fyi, this bug was not there in pig 0.8.x .
> >>    Was it pig 0.9.0 or 0.9.1 that you used ?
> >>
> >>    Thanks,
> >>    Thejas
> >>
> >>
> >>
> >>    On 4/24/12 5:21 PM, Aniket Mokashi wrote:
> >>
> >>        Hi Thejas,
> >>
> >>        Can you point me to jira that fixes filter-union problem (in
> pig)?
> >> I
> >>        haven't tried hcat-0.4 yet, good to know about that issue. I
> >>        will keep a
> >>        watcher.
> >>
> >>        Thanks,
> >>        Aniket
> >>
> >>        On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair
> >>        <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
> >>        <mailto:[EMAIL PROTECTED]
> >>        <mailto:[EMAIL PROTECTED]>__>> wrote:
> >>
> >>            Hi Aniket,
> >>            Are you using pig 0.9 or 0.9.1 ?
> >>            If yes, can you try with pig 0.9.2 ?
> >>            Wondering if you are also hitting the issue that Thomas
> >>        mentioned .
> >>
> >>            Thanks,
> >>            Thejas
> >>
> >>
> >>
> >>
> >>            On 4/23/12 7:39 PM, Aniket Mokashi wrote:
> >>
> >>                Something similar I have noticed is -
> >>
> >>                A = load ...
> >>                B1 = filter A by cond1;
> >>                B2 = filter A by cond2;
> >>                B3 = filter A by cond3;
> >>
> >>                B = union B1, B2, B3; does not push projection.

"...:::Aniket:::... Quetzalco@tl"
+
Thejas Nair 2012-04-25, 21:04
+
Aniket Mokashi 2012-04-25, 21:50