Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: HCatalog scans all partition even after mentioning date filter


Copy link to this message
-
Re: HCatalog scans all partition even after mentioning date filter
Yeah I think we just need to get projection pushdown to work through
Split operators.

D

On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair <[EMAIL PROTECTED]> wrote:
> cc'ing dev@pig as this is a pig issue.
>
> Aniket, What you saw is not related to PIG-2339 .
>
> In your example query, the logical plan will look like this -
>
> Load (A)
> |
> Split
>  |
> ---------------------------
> |             |
> Filter(B1)   Filter(B2) ...
>
> Because of the split operator introduced between the filter conditions and
> load, the filter does not get pushed into the load function.
>
> A simple way to fix this in pig would be to not share the load across the
> filter operators. Another option is to push the condition (B1 or B2 or B3)
> into Load operator and retain rest of the current plan (split and filters
> following the split).
>
> You can ofcourse achieve the same effect by having a separate load
> statememnt as input for each of the filters.
>
> I agree that we should make it possible to ask pig to throw a warning/error
> if the query is going to result in a full table scan on a partitioned table.
>
> Thanks,
> Thejas
>
>
>
>
> On 4/24/12 7:56 PM, Aniket Mokashi wrote:
>>
>> Sorry Thejas, I didnt look into the jira properly earlier.
>> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I did not
>> hit that issue earlier (and I patched datanucleus). filter-union was a
>> workaround I was using to avoid some of the thrift timeout problems
>> earlier. Thrift api's timeout on client side in 20sec by default (I
>> found the config to change this later) and I hence used a = load
>> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, b2..;
>> to expect to push these filters separately to the loader. But, that
>> doesn't work in pig. (I can open a jira, but I havent done enough
>> investigation at the code level). Thoughts?
>>
>> Thanks,
>> Aniket
>>
>> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>    The issue was not specific to filter-union
>>    - https://issues.apache.org/__jira/browse/PIG-2339
>>    <https://issues.apache.org/jira/browse/PIG-2339>.
>>    The fix was to do filter PushUpFilter before PartitionFilterOptimizer .
>>
>>    As this is not a hcat issue, it should not matter if you have an
>>    older hcat version .  fyi, this bug was not there in pig 0.8.x .
>>    Was it pig 0.9.0 or 0.9.1 that you used ?
>>
>>    Thanks,
>>    Thejas
>>
>>
>>
>>    On 4/24/12 5:21 PM, Aniket Mokashi wrote:
>>
>>        Hi Thejas,
>>
>>        Can you point me to jira that fixes filter-union problem (in pig)?
>> I
>>        haven't tried hcat-0.4 yet, good to know about that issue. I
>>        will keep a
>>        watcher.
>>
>>        Thanks,
>>        Aniket
>>
>>        On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair
>>        <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>>        <mailto:[EMAIL PROTECTED]
>>        <mailto:[EMAIL PROTECTED]>__>> wrote:
>>
>>            Hi Aniket,
>>            Are you using pig 0.9 or 0.9.1 ?
>>            If yes, can you try with pig 0.9.2 ?
>>            Wondering if you are also hitting the issue that Thomas
>>        mentioned .
>>
>>            Thanks,
>>            Thejas
>>
>>
>>
>>
>>            On 4/23/12 7:39 PM, Aniket Mokashi wrote:
>>
>>                Something similar I have noticed is -
>>
>>                A = load ...
>>                B1 = filter A by cond1;
>>                B2 = filter A by cond2;
>>                B3 = filter A by cond3;
>>
>>                B = union B1, B2, B3; does not push projection.
>>
>>                Is that expected?
>>
>>                Ideally, we should have "strict" mode under hcatalog,
>>        that when
>>                turned
>>                on will avoid executing pig queries on the full
>>        (partitioned) table.
>>
>>                Thanks,
>>                Aniket
>>
>>                On Mon, Apr 23, 2012 at 7:32 PM, Rajesh Balamohan