|
|
-
Re: HCatalog scans all partition even after mentioning date filterAniket Mokashi 2012-04-25, 21:50
Thanks Thejas!
https://issues.apache.org/jira/browse/PIG-2668 On Wed, Apr 25, 2012 at 2:04 PM, Thejas Nair <[EMAIL PROTECTED]> wrote: > yes, please create one. > Thanks, > Thejas > > > On 4/25/12 1:47 PM, Aniket Mokashi wrote: > >> Hi Dmitriy and Thejas, >> >> Should I open a jira for the same? >> >> Thanks, >> Aniket >> >> >> On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> Yeah I think we just need to get projection pushdown to work through >> Split operators. >> >> D >> >> On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]**>> wrote: >> > cc'ing dev@pig as this is a pig issue. >> > >> > Aniket, What you saw is not related to PIG-2339 . >> > >> > In your example query, the logical plan will look like this - >> > >> > Load (A) >> > | >> > Split >> > | >> > --------------------------- >> > | | >> > Filter(B1) Filter(B2) ... >> > >> > Because of the split operator introduced between the filter >> conditions and >> > load, the filter does not get pushed into the load function. >> > >> > A simple way to fix this in pig would be to not share the load >> across the >> > filter operators. Another option is to push the condition (B1 or >> B2 or B3) >> > into Load operator and retain rest of the current plan (split and >> filters >> > following the split). >> > >> > You can ofcourse achieve the same effect by having a separate load >> > statememnt as input for each of the filters. >> > >> > I agree that we should make it possible to ask pig to throw a >> warning/error >> > if the query is going to result in a full table scan on a >> partitioned table. >> > >> > Thanks, >> > Thejas >> > >> > >> > >> > >> > On 4/24/12 7:56 PM, Aniket Mokashi wrote: >> >> >> >> Sorry Thejas, I didnt look into the jira properly earlier. >> >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I >> did not >> >> hit that issue earlier (and I patched datanucleus). filter-union >> was a >> >> workaround I was using to avoid some of the thrift timeout problems >> >> earlier. Thrift api's timeout on client side in 20sec by default (I >> >> found the config to change this later) and I hence used a = load >> >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, >> b2..; >> >> to expect to push these filters separately to the loader. But, that >> >> doesn't work in pig. (I can open a jira, but I havent done enough >> >> investigation at the code level). Thoughts? >> >> >> >> Thanks, >> >> Aniket >> >> >> >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]**> >> >> <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]**>>> >> >> wrote: >> >> >> >> The issue was not specific to filter-union >> >> - https://issues.apache.org/__**jira/browse/PIG-2339<https://issues.apache.org/__jira/browse/PIG-2339> >> >> <https://issues.apache.org/**jira/browse/PIG-2339<https://issues.apache.org/jira/browse/PIG-2339> >> >. >> >> The fix was to do filter PushUpFilter before >> PartitionFilterOptimizer . >> >> >> >> As this is not a hcat issue, it should not matter if you have an >> >> older hcat version . fyi, this bug was not there in pig 0.8.x . >> >> Was it pig 0.9.0 or 0.9.1 that you used ? >> >> >> >> Thanks, >> >> Thejas >> >> >> >> >> >> >> >> On 4/24/12 5:21 PM, Aniket Mokashi wrote: >> >> >> >> Hi Thejas, >> >> >> >> Can you point me to jira that fixes filter-union problem >> (in pig)? >> >> I >> >> haven't tried hcat-0.4 yet, good to know about that issue. I >> >> will keep a "...:::Aniket:::... Quetzalco@tl" |