Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - LoadFunc and LoadMetadata


+
Jeff Yuan 2013-03-15, 03:37
+
Daniel Dai 2013-03-15, 17:57
+
Jeff Yuan 2013-03-15, 18:32
Copy link to this message
-
Re: LoadFunc and LoadMetadata
Daniel Dai 2013-03-15, 22:37
Yes, in theory filter should pushed above foreach. I don't know what
happen, the easiest way is do an explain and let's check the plan.

Daniel

On Fri, Mar 15, 2013 at 11:32 AM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
> Yes, I do use AS in the load statement. I thought Filters are always
> pushed as close to the Load operators as possible? What kind of
> Foreach is added?
>
> Thanks,
> Jeff
>
> On Fri, Mar 15, 2013 at 10:57 AM, Daniel Dai <[EMAIL PROTECTED]> wrote:
>> getPartitionKeys should be called by default. Did you use "AS" clause
>> in load statement? That could add a foreach between Load and Filter,
>> and getPartitionKeys will only be invoked if filter is right after
>> load. Do an explain to check for it.
>>
>> Thanks,
>> Daniel
>>
>> On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>>
>>> For CustomLoader (a class I'm implementing) which extends LoadFunct,
>>> implemented LoadMetadata, the "getPartitionKeys" function is supposed
>>> to be called by "PartitionFilterOptimizer", right? I put some debug
>>> statements in "getPartitionKeys", but this function doesn't seem like
>>> it's ever called.
>>>
>>> I've read through some Pig source, optimization rules can be disabled
>>> by properties, but by default the "PartitionFilterOptimizer" should be
>>> enabled. Also, in "PartitionFilterOptimizer", I saw checks to saw some
>>> other checks, like the Filter operator cannot have another dependency
>>> other than load, which is true in my case. Anyway, can someone shed
>>> some light on this? Am I understanding this interface incorrectly?
>>>
>>> My script is very simple (line 1 is load, line 2 is filter, and line 3
>>> is store), so the Logical Plan should be very simple. Also, I'm
>>> testing this in Pig local mode, not sure if that matters.
>>>
>>> Greatly appreciate any hints!