|
|
+
Thejas Nair 2012-04-25, 19:52
+
Dmitriy Ryaboy 2012-04-25, 20:45
-
Re: HCatalog scans all partition even after mentioning date filterAniket Mokashi 2012-04-25, 20:47
Hi Dmitriy and Thejas,
Should I open a jira for the same? Thanks, Aniket On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Yeah I think we just need to get projection pushdown to work through > Split operators. > > D > > On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair <[EMAIL PROTECTED]> > wrote: > > cc'ing dev@pig as this is a pig issue. > > > > Aniket, What you saw is not related to PIG-2339 . > > > > In your example query, the logical plan will look like this - > > > > Load (A) > > | > > Split > > | > > --------------------------- > > | | > > Filter(B1) Filter(B2) ... > > > > Because of the split operator introduced between the filter conditions > and > > load, the filter does not get pushed into the load function. > > > > A simple way to fix this in pig would be to not share the load across the > > filter operators. Another option is to push the condition (B1 or B2 or > B3) > > into Load operator and retain rest of the current plan (split and filters > > following the split). > > > > You can ofcourse achieve the same effect by having a separate load > > statememnt as input for each of the filters. > > > > I agree that we should make it possible to ask pig to throw a > warning/error > > if the query is going to result in a full table scan on a partitioned > table. > > > > Thanks, > > Thejas > > > > > > > > > > On 4/24/12 7:56 PM, Aniket Mokashi wrote: > >> > >> Sorry Thejas, I didnt look into the jira properly earlier. > >> EMR pig-0.9.1 already has that patch for PIG-2339 and hence I did not > >> hit that issue earlier (and I patched datanucleus). filter-union was a > >> workaround I was using to avoid some of the thrift timeout problems > >> earlier. Thrift api's timeout on client side in 20sec by default (I > >> found the config to change this later) and I hence used a = load > >> 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, b2..; > >> to expect to push these filters separately to the loader. But, that > >> doesn't work in pig. (I can open a jira, but I havent done enough > >> investigation at the code level). Thoughts? > >> > >> Thanks, > >> Aniket > >> > >> On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair <[EMAIL PROTECTED] > >> <mailto:[EMAIL PROTECTED]>> wrote: > >> > >> The issue was not specific to filter-union > >> - https://issues.apache.org/__jira/browse/PIG-2339 > >> <https://issues.apache.org/jira/browse/PIG-2339>. > >> The fix was to do filter PushUpFilter before > PartitionFilterOptimizer . > >> > >> As this is not a hcat issue, it should not matter if you have an > >> older hcat version . fyi, this bug was not there in pig 0.8.x . > >> Was it pig 0.9.0 or 0.9.1 that you used ? > >> > >> Thanks, > >> Thejas > >> > >> > >> > >> On 4/24/12 5:21 PM, Aniket Mokashi wrote: > >> > >> Hi Thejas, > >> > >> Can you point me to jira that fixes filter-union problem (in > pig)? > >> I > >> haven't tried hcat-0.4 yet, good to know about that issue. I > >> will keep a > >> watcher. > >> > >> Thanks, > >> Aniket > >> > >> On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair > >> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > >> <mailto:[EMAIL PROTECTED] > >> <mailto:[EMAIL PROTECTED]>__>> wrote: > >> > >> Hi Aniket, > >> Are you using pig 0.9 or 0.9.1 ? > >> If yes, can you try with pig 0.9.2 ? > >> Wondering if you are also hitting the issue that Thomas > >> mentioned . > >> > >> Thanks, > >> Thejas > >> > >> > >> > >> > >> On 4/23/12 7:39 PM, Aniket Mokashi wrote: > >> > >> Something similar I have noticed is - > >> > >> A = load ... > >> B1 = filter A by cond1; > >> B2 = filter A by cond2; > >> B3 = filter A by cond3; > >> > >> B = union B1, B2, B3; does not push projection. "...:::Aniket:::... Quetzalco@tl" +
Thejas Nair 2012-04-25, 21:04
+
Aniket Mokashi 2012-04-25, 21:50
|