Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> LoadFunc and LoadMetadata


Copy link to this message
-
Re: LoadFunc and LoadMetadata
Yes, I do use AS in the load statement. I thought Filters are always
pushed as close to the Load operators as possible? What kind of
Foreach is added?

Thanks,
Jeff

On Fri, Mar 15, 2013 at 10:57 AM, Daniel Dai <[EMAIL PROTECTED]> wrote:
> getPartitionKeys should be called by default. Did you use "AS" clause
> in load statement? That could add a foreach between Load and Filter,
> and getPartitionKeys will only be invoked if filter is right after
> load. Do an explain to check for it.
>
> Thanks,
> Daniel
>
> On Thu, Mar 14, 2013 at 8:37 PM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> For CustomLoader (a class I'm implementing) which extends LoadFunct,
>> implemented LoadMetadata, the "getPartitionKeys" function is supposed
>> to be called by "PartitionFilterOptimizer", right? I put some debug
>> statements in "getPartitionKeys", but this function doesn't seem like
>> it's ever called.
>>
>> I've read through some Pig source, optimization rules can be disabled
>> by properties, but by default the "PartitionFilterOptimizer" should be
>> enabled. Also, in "PartitionFilterOptimizer", I saw checks to saw some
>> other checks, like the Filter operator cannot have another dependency
>> other than load, which is true in my case. Anyway, can someone shed
>> some light on this? Am I understanding this interface incorrectly?
>>
>> My script is very simple (line 1 is load, line 2 is filter, and line 3
>> is store), so the Logical Plan should be very simple. Also, I'm
>> testing this in Pig local mode, not sure if that matters.
>>
>> Greatly appreciate any hints!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB