Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - LoadFunc and LoadMetadata


+
Jeff Yuan 2013-03-15, 03:37
+
Daniel Dai 2013-03-15, 17:57
Copy link to this message
-
Re: LoadFunc and LoadMetadata
Jeff Yuan 2013-03-15, 18:32
Yes, I do use AS in the load statement. I thought Filters are always
pushed as close to the Load operators as possible? What kind of
Foreach is added?

Thanks,
Jeff

On Fri, Mar 15, 2013 at 10:57 AM, Daniel Dai <[EMAIL PROTECTED]> wrote:
> getPartitionKeys should be called by default. Did you use "AS" clause
> in load statement? That could add a foreach between Load and Filter,
> and getPartitionKeys will only be invoked if filter is right after
> load. Do an explain to check for it.
>
> Thanks,
> Daniel
>
> On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> For CustomLoader (a class I'm implementing) which extends LoadFunct,
>> implemented LoadMetadata, the "getPartitionKeys" function is supposed
>> to be called by "PartitionFilterOptimizer", right? I put some debug
>> statements in "getPartitionKeys", but this function doesn't seem like
>> it's ever called.
>>
>> I've read through some Pig source, optimization rules can be disabled
>> by properties, but by default the "PartitionFilterOptimizer" should be
>> enabled. Also, in "PartitionFilterOptimizer", I saw checks to saw some
>> other checks, like the Filter operator cannot have another dependency
>> other than load, which is true in my case. Anyway, can someone shed
>> some light on this? Am I understanding this interface incorrectly?
>>
>> My script is very simple (line 1 is load, line 2 is filter, and line 3
>> is store), so the Logical Plan should be very simple. Also, I'm
>> testing this in Pig local mode, not sure if that matters.
>>
>> Greatly appreciate any hints!
+
Daniel Dai 2013-03-15, 22:37