Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> View Partition Pruning not Occurring during transform


+
John Omernik 2012-10-10, 19:04
+
shrikanth shankar 2012-10-10, 20:24
Copy link to this message
-
Re: View Partition Pruning not Occurring during transform
Agreed. That's the conclusion we came to as well. So it's less of a bug and
more of a feature request. I think one of the main advantages of hive is
the flexibility in allowing non-technical users to run basic queries
without having to think about the transform stuff. (i.e. we in the IT shop
can setup the transform)  I like the annotation idea that some how the
partition specs can be pushed through (identified in some other way etc).
 I am new to the Apache/JIRA world, what would you recommend for getting
this into a feature request for consideration? I am not a Java programmer,
so my idea may need to be paired with a champion to help implement it :)

On Wed, Oct 10, 2012 at 3:24 PM, shrikanth shankar <[EMAIL PROTECTED]>wrote:

> I assume the reason for this is that the Hive compiler has no way of
> determining that the 'day' that is input into the transform script is the
> same 'day' that is output from the transform script. Even if it did, its
> unclear if pushing down would be legal without knowing the semantics of the
> transformation. Any optimization to be done here will likely need an
> annotation somewhere to say that certain columns in the output of a
> transform refer to specific columns in the input of a transform for
> predicate push down purposes (and that such pushdown is legal for this
> transformation)
>
> thanks,
> Shrikanth
> On Oct 10, 2012, at 12:04 PM, John Omernik wrote:
>
> > Greetings all, I am trying to incorporate a TRANSFORM into a view (so we
> can abstract the transform script away from the user)
> >
> >
> >
> > As a Test, I have a table partitioned on day (in YYYY-MM-DD formated)
> with lots of partitions
> >
> > and I tried this
> >
> > CREATE VIEW view_transform as
> > Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table;
> >
> > The reason I used 'cat' in my test is if this works, I will distribute
> my transform scripts to each node manually, I know each node has cat, so
> this works as a test.
> >
> > When run
> >
> > SELECT * from view_transform where day = '2012-10-08'  10,432 map tasks
> get spun up.
> >
> > If I rewrite the view to be
> >
> > CREATE VIEW view_transform as
> > Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table
> where day = '2012-10-08';
> >
> > Then only 16 map tasks get spun up (the desired behavior, but the
> pruning is happening in the view not in the query)
> >
> > Thus I wanted input on whether this should be considered a bug.  I.e.
> Should we be able to define a partition spec in a view that uses a
> transform that allows normal pruning to occur even though the partition
> spec will be passed to the transfrom script?  I think we should, and it's
> likely doable some how. This would be awesome for a number of situations
> where you may want to expose "transformed" data to analysis without the
> mess of having them format their script for transform.
> >
> >
>
>
+
Edward Capriolo 2012-10-11, 13:32
+
John Omernik 2012-10-11, 19:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB