Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: HCatalog scans all partition even after mentioning date filter


Copy link to this message
-
Re: HCatalog scans all partition even after mentioning date filter
yes, please create one.
Thanks,
Thejas

On 4/25/12 1:47 PM, Aniket Mokashi wrote:
> Hi Dmitriy and Thejas,
>
> Should I open a jira for the same?
>
> Thanks,
> Aniket
>
>
> On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Yeah I think we just need to get projection pushdown to work through
>     Split operators.
>
>     D
>
>     On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>      > cc'ing dev@pig as this is a pig issue.
>      >
>      > Aniket, What you saw is not related to PIG-2339 .
>      >
>      > In your example query, the logical plan will look like this -
>      >
>      > Load (A)
>      > |
>      > Split
>      >  |
>      > ---------------------------
>      > |             |
>      > Filter(B1)   Filter(B2) ...
>      >
>      > Because of the split operator introduced between the filter
>     conditions and
>      > load, the filter does not get pushed into the load function.
>      >
>      > A simple way to fix this in pig would be to not share the load
>     across the
>      > filter operators. Another option is to push the condition (B1 or
>     B2 or B3)
>      > into Load operator and retain rest of the current plan (split and
>     filters
>      > following the split).
>      >
>      > You can ofcourse achieve the same effect by having a separate load
>      > statememnt as input for each of the filters.
>      >
>      > I agree that we should make it possible to ask pig to throw a
>     warning/error
>      > if the query is going to result in a full table scan on a
>     partitioned table.
>      >
>      > Thanks,
>      > Thejas
>      >
>      >
>      >
>      >
>      > On 4/24/12 7:56 PM, Aniket Mokashi wrote:
>      >>
>      >> Sorry Thejas, I didnt look into the jira properly earlier.
>      >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I
>     did not
>      >> hit that issue earlier (and I patched datanucleus). filter-union
>     was a
>      >> workaround I was using to avoid some of the thrift timeout problems
>      >> earlier. Thrift api's timeout on client side in 20sec by default (I
>      >> found the config to change this later) and I hence used a = load
>      >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1,
>     b2..;
>      >> to expect to push these filters separately to the loader. But, that
>      >> doesn't work in pig. (I can open a jira, but I havent done enough
>      >> investigation at the code level). Thoughts?
>      >>
>      >> Thanks,
>      >> Aniket
>      >>
>      >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair
>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>      >> <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>>
>     wrote:
>      >>
>      >>    The issue was not specific to filter-union
>      >>    - https://issues.apache.org/__jira/browse/PIG-2339
>      >> <https://issues.apache.org/jira/browse/PIG-2339>.
>      >>    The fix was to do filter PushUpFilter before
>     PartitionFilterOptimizer .
>      >>
>      >>    As this is not a hcat issue, it should not matter if you have an
>      >>    older hcat version .  fyi, this bug was not there in pig 0.8.x .
>      >>    Was it pig 0.9.0 or 0.9.1 that you used ?
>      >>
>      >>    Thanks,
>      >>    Thejas
>      >>
>      >>
>      >>
>      >>    On 4/24/12 5:21 PM, Aniket Mokashi wrote:
>      >>
>      >>        Hi Thejas,
>      >>
>      >>        Can you point me to jira that fixes filter-union problem
>     (in pig)?
>      >> I
>      >>        haven't tried hcat-0.4 yet, good to know about that issue. I
>      >>        will keep a
>      >>        watcher.
>      >>
>      >>        Thanks,
>      >>        Aniket
>      >>
>      >>        On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair
>      >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>     <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>