Yes, in theory filter should pushed above foreach. I don't know what
happen, the easiest way is do an explain and let's check the plan.
On Fri, Mar 15, 2013 at 11:32 AM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
> Yes, I do use AS in the load statement. I thought Filters are always
> pushed as close to the Load operators as possible? What kind of
> Foreach is added?
> On Fri, Mar 15, 2013 at 10:57 AM, Daniel Dai <[EMAIL PROTECTED]> wrote:
>> getPartitionKeys should be called by default. Did you use "AS" clause
>> in load statement? That could add a foreach between Load and Filter,
>> and getPartitionKeys will only be invoked if filter is right after
>> load. Do an explain to check for it.
>> On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>> For CustomLoader (a class I'm implementing) which extends LoadFunct,
>>> implemented LoadMetadata, the "getPartitionKeys" function is supposed
>>> to be called by "PartitionFilterOptimizer", right? I put some debug
>>> statements in "getPartitionKeys", but this function doesn't seem like
>>> it's ever called.
>>> I've read through some Pig source, optimization rules can be disabled
>>> by properties, but by default the "PartitionFilterOptimizer" should be
>>> enabled. Also, in "PartitionFilterOptimizer", I saw checks to saw some
>>> other checks, like the Filter operator cannot have another dependency
>>> other than load, which is true in my case. Anyway, can someone shed
>>> some light on this? Am I understanding this interface incorrectly?
>>> My script is very simple (line 1 is load, line 2 is filter, and line 3
>>> is store), so the Logical Plan should be very simple. Also, I'm
>>> testing this in Pig local mode, not sure if that matters.
>>> Greatly appreciate any hints!