Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Loader partitioning on field


+
Jeff Yuan 2013-03-14, 20:31
Copy link to this message
-
Re: Loader partitioning on field
Rohini Palaniswamy 2013-03-14, 20:51
Jeff,

1) It should not. If it does push, then it is a bug in pig.

2) I think it should be fine.

3) Look at PColFilterExtractor and PartitionFilterOptimizer

Regards,

Rohini
On Thu, Mar 14, 2013 at 1:31 PM, Jeff Yuan <[EMAIL PROTECTED]> wrote:

> I am writing a loader for a storage format, which partitions by a
> particular field in the record. So I would like to implement something
> which can push down filters on the partitioned field so that the
> record reader does not need to read files that are outside the
> filtered range. In the interface "LoadMetadata", the
> "getPartitionKeys" and "setPartitionFilter" functions seem to support
> what I need (where Pig should pass the filtering expression on the
> declared partition keys to "setPartitionFilter", but I have a couple
> of questions. I'm going to reference the following example, where
> timestamp is the partition key.
>
> a = load 'stored_data' using CustomLoader();
> b = filter a by timestamp = CUSTOM_UDF(date, month);
>
> 1. Would partitioning work in this case where the partition key filter
> includes a UDF?
>
> 2. Does the partition statement need to be directly after the load
> statement? What I mean is, if I declare a variable c between a and b
> which does some other operation on a, will Pig pass the filter
> expression of b when loading a?
>
> 3. Can you point out roughly where this "setPartitionFilter" function
> is called in Pig code during the load process? I couldn't seem to find
> it through a search of the Pig source.
>
> Thanks a lot!
>
+
Jeff Yuan 2013-03-14, 21:00
+
Rohini Palaniswamy 2013-03-14, 21:30
+
Jeff Yuan 2013-03-14, 22:03
+
Jonathan Coveney 2013-03-14, 22:15
+
Jeff Yuan 2013-03-14, 22:56
+
Jonathan Coveney 2013-03-15, 10:17