|
|
-
Re: HCatalog scans all partition even after mentioning date filterDmitriy Ryaboy 2012-04-25, 20:45
Yeah I think we just need to get projection pushdown to work through
Split operators. D On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair <[EMAIL PROTECTED]> wrote: > cc'ing dev@pig as this is a pig issue. > > Aniket, What you saw is not related to PIG-2339 . > > In your example query, the logical plan will look like this - > > Load (A) > | > Split > | > --------------------------- > | | > Filter(B1) Filter(B2) ... > > Because of the split operator introduced between the filter conditions and > load, the filter does not get pushed into the load function. > > A simple way to fix this in pig would be to not share the load across the > filter operators. Another option is to push the condition (B1 or B2 or B3) > into Load operator and retain rest of the current plan (split and filters > following the split). > > You can ofcourse achieve the same effect by having a separate load > statememnt as input for each of the filters. > > I agree that we should make it possible to ask pig to throw a warning/error > if the query is going to result in a full table scan on a partitioned table. > > Thanks, > Thejas > > > > > On 4/24/12 7:56 PM, Aniket Mokashi wrote: >> >> Sorry Thejas, I didnt look into the jira properly earlier. >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I did not >> hit that issue earlier (and I patched datanucleus). filter-union was a >> workaround I was using to avoid some of the thrift timeout problems >> earlier. Thrift api's timeout on client side in 20sec by default (I >> found the config to change this later) and I hence used a = load >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, b2..; >> to expect to push these filters separately to the loader. But, that >> doesn't work in pig. (I can open a jira, but I havent done enough >> investigation at the code level). Thoughts? >> >> Thanks, >> Aniket >> >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> The issue was not specific to filter-union >> - https://issues.apache.org/__jira/browse/PIG-2339 >> <https://issues.apache.org/jira/browse/PIG-2339>. >> The fix was to do filter PushUpFilter before PartitionFilterOptimizer . >> >> As this is not a hcat issue, it should not matter if you have an >> older hcat version . fyi, this bug was not there in pig 0.8.x . >> Was it pig 0.9.0 or 0.9.1 that you used ? >> >> Thanks, >> Thejas >> >> >> >> On 4/24/12 5:21 PM, Aniket Mokashi wrote: >> >> Hi Thejas, >> >> Can you point me to jira that fixes filter-union problem (in pig)? >> I >> haven't tried hcat-0.4 yet, good to know about that issue. I >> will keep a >> watcher. >> >> Thanks, >> Aniket >> >> On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> >> <mailto:[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>__>> wrote: >> >> Hi Aniket, >> Are you using pig 0.9 or 0.9.1 ? >> If yes, can you try with pig 0.9.2 ? >> Wondering if you are also hitting the issue that Thomas >> mentioned . >> >> Thanks, >> Thejas >> >> >> >> >> On 4/23/12 7:39 PM, Aniket Mokashi wrote: >> >> Something similar I have noticed is - >> >> A = load ... >> B1 = filter A by cond1; >> B2 = filter A by cond2; >> B3 = filter A by cond3; >> >> B = union B1, B2, B3; does not push projection. >> >> Is that expected? >> >> Ideally, we should have "strict" mode under hcatalog, >> that when >> turned >> on will avoid executing pig queries on the full >> (partitioned) table. >> >> Thanks, >> Aniket >> >> On Mon, Apr 23, 2012 at 7:32 PM, Rajesh Balamohan |