Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> nested UDFs on Partition column


Copy link to this message
-
Re: nested UDFs on Partition column
I don't know what the state of Hive's partition pruning is, but I
would imagine that the problem is that the two example you're giving
are fundamentally different.

1) WHERE local_date = =date_add('2011-12-07',3) ,

the udf is a function of some constants, so the constant gets
evaluated at compile time.

2) WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3),

unix_timestamp() is not explicitly a constant, and so the expression
won't be simplified.

I would imagine that the constant simplification code probably doesn't
know the difference between a partition column and a real column, and
so treats everything as a real column. If local_date wasn't a
partition column, then there's no reasonable way of simplifying that
predicate at compile time.

Cheers,

Phil.

On 19 April 2012 11:50, Nitin Pawar <[EMAIL PROTECTED]> wrote:
> as per my understanding,
>
> In this case hive needs to look for all the partitions because it does not
> have the value before hand on the partition check and note the udfs are
> executed on the mapred and not on hive client side.
>
> I would suggest you write a hive query in a file and replace the partition
> value with a variable
> something like
>
> for partitionValue in values
>
>           hive $HIVEPARAMS -hiveconf  partition=$partition -e hivequery.hql
>
> and then in hivequery.sql you can refer the variable with
>
> where column_name = '${hiveconf:partition}'
>
> I may be wrong in interpreting the execution pattern of hivequery but this
> approach solved my problem
> Thanks,
> nitin
>
>
> On Thu, Apr 19, 2012 at 3:27 PM, Ramkumar <[EMAIL PROTECTED]>
> wrote:
>>
>> Hi,
>>
>> I have a table partitioned by local_date.  When I write a query with
>>
>> WHERE local_date = =date_add('2011-12-07',3) ,
>>
>> hive executes the UDF ahead and looks only into the specific partitions.
>> But when the udf becomes more complex like
>>
>> WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3),
>>
>> hive looks through all the partitions even though the above function  can
>> very well be computed ahead of time and optimize the query.  Is this
>> behaviour intentional ? And is there a workaround other than hardcoding the
>> date or using a param?
>>
>> Thanks,
>> Ramkumar
>
>
>
>
> --
> Nitin Pawar
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB