Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> nested UDFs on Partition column

Ramkumar 2012-04-19, 09:57
Copy link to this message
Re: nested UDFs on Partition column
as per my understanding,

In this case hive needs to look for all the partitions because it does not
have the value before hand on the partition check and note the udfs are
executed on the mapred and not on hive client side.

I would suggest you write a hive query in a file and replace the partition
value with a variable
something like

for partitionValue in values

          hive $HIVEPARAMS -hiveconf  partition=$partition -e hivequery.hql

and then in hivequery.sql you can refer the variable with

where column_name = '${hiveconf:partition}'

I may be wrong in interpreting the execution pattern of hivequery but this
approach solved my problem

On Thu, Apr 19, 2012 at 3:27 PM, Ramkumar <[EMAIL PROTECTED]>wrote:

> Hi,
> I have a table partitioned by local_date.  When I write a query with
> WHERE local_date = =date_add('2011-12-07',3) ,
> hive executes the UDF ahead and looks only into the specific partitions.
> But when the udf becomes more complex like
> WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3),
> hive looks through all the partitions even though the above function  can
> very well be computed ahead of time and optimize the query.  Is this
> behaviour intentional ? And is there a workaround other than hardcoding the
> date or using a param?
> Thanks,
> Ramkumar

Nitin Pawar
Philip Tromans 2012-04-19, 15:35
Philip Tromans 2012-04-19, 15:38
Ramkumar 2012-05-01, 13:38
Edward Capriolo 2012-05-01, 13:43