Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> View Partition Pruning not Occurring during transform


Copy link to this message
-
View Partition Pruning not Occurring during transform
Greetings all, I am trying to incorporate a TRANSFORM into a view (so we
can abstract the transform script away from the user)

As a Test, I have a table partitioned on day (in YYYY-MM-DD formated) with
lots of partitions

and I tried this

CREATE VIEW view_transform as
Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table;

The reason I used 'cat' in my test is if this works, I will distribute my
transform scripts to each node manually, I know each node has cat, so this
works as a test.

When run

SELECT * from view_transform where day = '2012-10-08'  10,432 map tasks get
spun up.

If I rewrite the view to be

CREATE VIEW view_transform as
Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table where
day = '2012-10-08';

Then only 16 map tasks get spun up (the desired behavior, but the pruning
is happening in the view not in the query)

Thus I wanted input on whether this should be considered a bug.  I.e.
Should we be able to define a partition spec in a view that uses a
transform that allows normal pruning to occur even though the partition
spec will be passed to the transfrom script?  I think we should, and it's
likely doable some how. This would be awesome for a number of situations
where you may want to expose "transformed" data to analysis without the
mess of having them format their script for transform.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB