Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> View Partition Pruning not Occurring during transform


Copy link to this message
-
View Partition Pruning not Occurring during transform
Greetings all, I am trying to incorporate a TRANSFORM into a view (so we
can abstract the transform script away from the user)

As a Test, I have a table partitioned on day (in YYYY-MM-DD formated) with
lots of partitions

and I tried this

CREATE VIEW view_transform as
Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table;

The reason I used 'cat' in my test is if this works, I will distribute my
transform scripts to each node manually, I know each node has cat, so this
works as a test.

When run

SELECT * from view_transform where day = '2012-10-08'  10,432 map tasks get
spun up.

If I rewrite the view to be

CREATE VIEW view_transform as
Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table where
day = '2012-10-08';

Then only 16 map tasks get spun up (the desired behavior, but the pruning
is happening in the view not in the query)

Thus I wanted input on whether this should be considered a bug.  I.e.
Should we be able to define a partition spec in a view that uses a
transform that allows normal pruning to occur even though the partition
spec will be passed to the transfrom script?  I think we should, and it's
likely doable some how. This would be awesome for a number of situations
where you may want to expose "transformed" data to analysis without the
mess of having them format their script for transform.
+
shrikanth shankar 2012-10-10, 20:24
+
John Omernik 2012-10-11, 01:08
+
Edward Capriolo 2012-10-11, 13:32
+
John Omernik 2012-10-11, 19:04