Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Re: HCatalog scans all partition even after mentioning date filter


+
Thejas Nair 2012-04-25, 19:52
+
Dmitriy Ryaboy 2012-04-25, 20:45
+
Aniket Mokashi 2012-04-25, 20:47
+
Thejas Nair 2012-04-25, 21:04
Copy link to this message
-
Re: HCatalog scans all partition even after mentioning date filter
Thanks Thejas!
https://issues.apache.org/jira/browse/PIG-2668

On Wed, Apr 25, 2012 at 2:04 PM, Thejas Nair <[EMAIL PROTECTED]> wrote:

> yes, please create one.
> Thanks,
> Thejas
>
>
> On 4/25/12 1:47 PM, Aniket Mokashi wrote:
>
>> Hi Dmitriy and Thejas,
>>
>> Should I open a jira for the same?
>>
>> Thanks,
>> Aniket
>>
>>
>> On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>    Yeah I think we just need to get projection pushdown to work through
>>    Split operators.
>>
>>    D
>>
>>    On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair
>>    <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]**>> wrote:
>>     > cc'ing dev@pig as this is a pig issue.
>>     >
>>     > Aniket, What you saw is not related to PIG-2339 .
>>     >
>>     > In your example query, the logical plan will look like this -
>>     >
>>     > Load (A)
>>     > |
>>     > Split
>>     >  |
>>     > ---------------------------
>>     > |             |
>>     > Filter(B1)   Filter(B2) ...
>>     >
>>     > Because of the split operator introduced between the filter
>>    conditions and
>>     > load, the filter does not get pushed into the load function.
>>     >
>>     > A simple way to fix this in pig would be to not share the load
>>    across the
>>     > filter operators. Another option is to push the condition (B1 or
>>    B2 or B3)
>>     > into Load operator and retain rest of the current plan (split and
>>    filters
>>     > following the split).
>>     >
>>     > You can ofcourse achieve the same effect by having a separate load
>>     > statememnt as input for each of the filters.
>>     >
>>     > I agree that we should make it possible to ask pig to throw a
>>    warning/error
>>     > if the query is going to result in a full table scan on a
>>    partitioned table.
>>     >
>>     > Thanks,
>>     > Thejas
>>     >
>>     >
>>     >
>>     >
>>     > On 4/24/12 7:56 PM, Aniket Mokashi wrote:
>>     >>
>>     >> Sorry Thejas, I didnt look into the jira properly earlier.
>>     >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I
>>    did not
>>     >> hit that issue earlier (and I patched datanucleus). filter-union
>>    was a
>>     >> workaround I was using to avoid some of the thrift timeout problems
>>     >> earlier. Thrift api's timeout on client side in 20sec by default (I
>>     >> found the config to change this later) and I hence used a = load
>>     >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1,
>>    b2..;
>>     >> to expect to push these filters separately to the loader. But, that
>>     >> doesn't work in pig. (I can open a jira, but I havent done enough
>>     >> investigation at the code level). Thoughts?
>>     >>
>>     >> Thanks,
>>     >> Aniket
>>     >>
>>     >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair
>>    <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]**>
>>     >> <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]**>>>
>>
>>    wrote:
>>     >>
>>     >>    The issue was not specific to filter-union
>>     >>    - https://issues.apache.org/__**jira/browse/PIG-2339<https://issues.apache.org/__jira/browse/PIG-2339>
>>     >> <https://issues.apache.org/**jira/browse/PIG-2339<https://issues.apache.org/jira/browse/PIG-2339>
>> >.
>>     >>    The fix was to do filter PushUpFilter before
>>    PartitionFilterOptimizer .
>>     >>
>>     >>    As this is not a hcat issue, it should not matter if you have an
>>     >>    older hcat version .  fyi, this bug was not there in pig 0.8.x .
>>     >>    Was it pig 0.9.0 or 0.9.1 that you used ?
>>     >>
>>     >>    Thanks,
>>     >>    Thejas
>>     >>
>>     >>
>>     >>
>>     >>    On 4/24/12 5:21 PM, Aniket Mokashi wrote:
>>     >>
>>     >>        Hi Thejas,
>>     >>
>>     >>        Can you point me to jira that fixes filter-union problem
>>    (in pig)?
>>     >> I
>>     >>        haven't tried hcat-0.4 yet, good to know about that issue. I
>>     >>        will keep a
"...:::Aniket:::... Quetzalco@tl"