Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Loader partitioning on field


Copy link to this message
-
Re: Loader partitioning on field
Thanks! Regarding 1), where there is a UDF in the filter step on a
partition field. The UDF is not first evaluated before and then the
result passed to the load function?

A separate question: In a LoadFunc, is there a way to get a reference
to the logical query plan?

Thanks again.

On Thu, Mar 14, 2013 at 1:51 PM, Rohini Palaniswamy
<[EMAIL PROTECTED]> wrote:
> Jeff,
>
> 1) It should not. If it does push, then it is a bug in pig.
>
> 2) I think it should be fine.
>
> 3) Look at PColFilterExtractor and PartitionFilterOptimizer
>
> Regards,
>
> Rohini
>
>
> On Thu, Mar 14, 2013 at 1:31 PM, Jeff Yuan <[EMAIL PROTECTED]> wrote:
>
>> I am writing a loader for a storage format, which partitions by a
>> particular field in the record. So I would like to implement something
>> which can push down filters on the partitioned field so that the
>> record reader does not need to read files that are outside the
>> filtered range. In the interface "LoadMetadata", the
>> "getPartitionKeys" and "setPartitionFilter" functions seem to support
>> what I need (where Pig should pass the filtering expression on the
>> declared partition keys to "setPartitionFilter", but I have a couple
>> of questions. I'm going to reference the following example, where
>> timestamp is the partition key.
>>
>> a = load 'stored_data' using CustomLoader();
>> b = filter a by timestamp = CUSTOM_UDF(date, month);
>>
>> 1. Would partitioning work in this case where the partition key filter
>> includes a UDF?
>>
>> 2. Does the partition statement need to be directly after the load
>> statement? What I mean is, if I declare a variable c between a and b
>> which does some other operation on a, will Pig pass the filter
>> expression of b when loading a?
>>
>> 3. Can you point out roughly where this "setPartitionFilter" function
>> is called in Pig code during the load process? I couldn't seem to find
>> it through a search of the Pig source.
>>
>> Thanks a lot!
>>