Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> View Partition Pruning not Occurring during transform


Copy link to this message
-
Re: View Partition Pruning not Occurring during transform
I assume the reason for this is that the Hive compiler has no way of determining that the 'day' that is input into the transform script is the same 'day' that is output from the transform script. Even if it did, its unclear if pushing down would be legal without knowing the semantics of the transformation. Any optimization to be done here will likely need an annotation somewhere to say that certain columns in the output of a transform refer to specific columns in the input of a transform for predicate push down purposes (and that such pushdown is legal for this transformation)

thanks,
Shrikanth
On Oct 10, 2012, at 12:04 PM, John Omernik wrote:

> Greetings all, I am trying to incorporate a TRANSFORM into a view (so we can abstract the transform script away from the user)
>
>
>
> As a Test, I have a table partitioned on day (in YYYY-MM-DD formated) with lots of partitions
>
> and I tried this
>
> CREATE VIEW view_transform as
> Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table;
>
> The reason I used 'cat' in my test is if this works, I will distribute my transform scripts to each node manually, I know each node has cat, so this works as a test.
>
> When run
>
> SELECT * from view_transform where day = '2012-10-08'  10,432 map tasks get spun up.
>
> If I rewrite the view to be
>
> CREATE VIEW view_transform as
> Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table where day = '2012-10-08';
>
> Then only 16 map tasks get spun up (the desired behavior, but the pruning is happening in the view not in the query)
>
> Thus I wanted input on whether this should be considered a bug.  I.e. Should we be able to define a partition spec in a view that uses a transform that allows normal pruning to occur even though the partition spec will be passed to the transfrom script?  I think we should, and it's likely doable some how. This would be awesome for a number of situations where you may want to expose "transformed" data to analysis without the mess of having them format their script for transform.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB