Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - nested UDFs on Partition column


Copy link to this message
-
Re: nested UDFs on Partition column
Philip Tromans 2012-04-19, 15:35
I don't know what the state of Hive's partition pruning is, but I
would imagine that the problem is that the two example you're giving
are fundamentally different.

1) WHERE local_date = =date_add('2011-12-07',3) ,

the udf is a function of some constants, so the constant gets
evaluated at compile time.

2) WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3),

unix_timestamp() is not explicitly a constant, and so the expression
won't be simplified.

I would imagine that the constant simplification code probably doesn't
know the difference between a partition column and a real column, and
so treats everything as a real column. If local_date wasn't a
partition column, then there's no reasonable way of simplifying that
predicate at compile time.

Cheers,

Phil.

On 19 April 2012 11:50, Nitin Pawar <[EMAIL PROTECTED]> wrote:
> as per my understanding,
>
> In this case hive needs to look for all the partitions because it does not
> have the value before hand on the partition check and note the udfs are
> executed on the mapred and not on hive client side.
>
> I would suggest you write a hive query in a file and replace the partition
> value with a variable
> something like
>
> for partitionValue in values
>
>           hive $HIVEPARAMS -hiveconf  partition=$partition -e hivequery.hql
>
> and then in hivequery.sql you can refer the variable with
>
> where column_name = '${hiveconf:partition}'
>
> I may be wrong in interpreting the execution pattern of hivequery but this
> approach solved my problem
> Thanks,
> nitin
>
>
> On Thu, Apr 19, 2012 at 3:27 PM, Ramkumar <[EMAIL PROTECTED]>
> wrote:
>>
>> Hi,
>>
>> I have a table partitioned by local_date.  When I write a query with
>>
>> WHERE local_date = =date_add('2011-12-07',3) ,
>>
>> hive executes the UDF ahead and looks only into the specific partitions.
>> But when the udf becomes more complex like
>>
>> WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3),
>>
>> hive looks through all the partitions even though the above function  can
>> very well be computed ahead of time and optimize the query.  Is this
>> behaviour intentional ? And is there a workaround other than hardcoding the
>> date or using a param?
>>
>> Thanks,
>> Ramkumar
>
>
>
>
> --
> Nitin Pawar
>