Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Re: Custom UserDefinedFunction in Hive


Copy link to this message
-
Re: Custom UserDefinedFunction in Hive
Vijay 2012-08-07, 18:11
Given the implementation of the UDF, I don't think hive would be able
to use partition pruning. Especially the version you're using. I'd
really recommend upgrading to a later version that has the hiveconf
support. That can save a lot of trouble rather than trying to get
things working on 0.6

On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <[EMAIL PROTECTED]> wrote:
> Hi Jan,
>
> I have date in different format also, so that is the reason I was thinking
> to do by this approach. How can I make sure this will work on the selected
> partition only and it will not scan the entire table. I will add your
> suggestion in my UDF as deterministic thing.
>
> My simple question here is- How to get the Yesterdays date which I can use
> on the Date Partition I cannot use hiveconf here as I am working with Hive
> 0.6
>
>
>
>
> Raihan Jamal
>
>
>
> On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
>>
>> I'm afraid that  he query
>>
>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>
>> will scan entire table, because the functions is evaluated at runtime, so
>> Hive doesn't know what the value is when it decides which files to scan. I
>> am not 100% sure though, you should try it.
>>
>> Also, you might want to try to add annotation to your UDF saying that the
>> function is deterministic:
>> @UDFType(deterministic=false)
>>
>> I think Hive might be clever enough to evaluate it early enough to use the
>> partition pruning correctly, since it operates on constant expression. But
>> again, I'm not really sure, maybe someone with deeper knowledge of Hive
>> optimizations will tell us more. It is actually quite interesting question.
>>
>> Another way to help Hive with the optimizations might be to skip passing
>> the format string argument, if you have all dates in same format, you can
>> call the function just like 'yesterdaydate()' and hardcode the format in the
>> function.
>>
>> Jan
>>
>>
>> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hi Jan,
>>>
>>>
>>>
>>> I figured that out, it is working fine for me now. The only question I
>>> have is, if I am doing like this-
>>>
>>>
>>>
>>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>>
>>>
>>>
>>> Then the above query will be evaluated as below right?
>>>
>>>
>>>
>>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>>>
>>>
>>>
>>> So that means it will look for data in the corresponding dt partition
>>> (20120806) only right as above table is partitioned on dt column ? And it
>>> will not scan the whole table right?
>>>
>>>
>>>
>>>
>>> Raihan Jamal
>>>
>>>
>>>
>>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Hi Jamal,
>>>>
>>>> Check if the function really returns what it should and that your data
>>>> are really in yyyyMMdd format. You can do this by simple query like this:
>>>>
>>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1;
>>>>
>>>> I don't see anything wrong with the function itself, it works well for
>>>> me (although I tested it in hive 0.7.1). The only thing I would change about
>>>> it would be to optimize it by calling 'new' only at the time of construction
>>>> and reusing the object when the function is called, but that should not
>>>> affect the functionality at all.
>>>>
>>>> Best regards,
>>>> Jan
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>> Problem
>>>>>
>>>>> I created the below UserDefinedFunction to get the yesterday's day in
>>>>> the format I wanted as I will be passing the format into this below method
>>>>> from the query.
>>>>>
>>>>>
>>>>>
>>>>> public final class YesterdayDate extends UDF {
>>>>>
>>>>>
>>>>>
>>>>>                 public String evaluate(final String format) {
>>>>>
>>>>>                                 DateFormat dateFormat = new
>>>>> SimpleDateFormat(format);
>>>>>
>>>>>                                 Calendar cal = Calendar.getInstance();