Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: Custom UserDefinedFunction in Hive


Copy link to this message
-
Re: Custom UserDefinedFunction in Hive
Have you tried using EXPLAIN[1] on your query? I usually like to use that
to get a better understanding of what my query is actually doing and
debugging at other times.

[1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

On Tue, Aug 7, 2012 at 12:20 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:

> Hi Jan,
>
>
> I figured that out, it is working fine for me now. The only question I
> have is, if I am doing like this-
>
>
>
> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>
>
>
> Then the above query will be evaluated as below right?
>
>
>
> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>
>
>
> So that means it will look for data in the corresponding dt partition *(20120806)
> *only right as above table is partitioned on dt column ? And it will not
> scan the whole table right?**
>
>
>
> *Raihan Jamal*
>
>
>
> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
>
>> Hi Jamal,
>>
>> Check if the function really returns what it should and that your data
>> are really in yyyyMMdd format. You can do this by simple query like this:
>>
>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1;
>>
>> I don't see anything wrong with the function itself, it works well for me
>> (although I tested it in hive 0.7.1). The only thing I would change about
>> it would be to optimize it by calling 'new' only at the time of
>> construction and reusing the object when the function is called, but that
>> should not affect the functionality at all.
>>
>> Best regards,
>> Jan
>>
>>
>>
>>
>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
>>
>>> *Problem*
>>>
>>> I created the below UserDefinedFunction to get the yesterday's day in
>>> the format I wanted as I will be passing the format into this below method
>>> from the query.
>>>
>>>
>>>
>>> *public final class YesterdayDate extends UDF {*
>>>
>>> * *
>>>
>>> *                public String evaluate(final String format) { *
>>>
>>> *                                DateFormat dateFormat = new
>>> SimpleDateFormat(format); *
>>>
>>> *                                Calendar cal = Calendar.getInstance();*
>>>
>>> *                                cal.add(Calendar.DATE, -1);     *
>>>
>>> *                                return
>>> dateFormat.format(cal.getTime()).toString(); *
>>>
>>> *                } *
>>>
>>> *}*
>>>
>>>
>>>
>>>
>>>
>>> So whenever I try to run the query like below by adding the jar to
>>> classpath and creating the temporary function yesterdaydate, I always get
>>> zero result back-
>>>
>>>
>>>
>>> hive> create temporary function *yesterdaydate* as
>>> 'com.example.hive.udf.YesterdayDate';
>>>
>>> OK
>>>
>>> Time taken: 0.512 seconds
>>>
>>>
>>>
>>> Below is the query I am running-
>>>
>>>
>>>
>>> *hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT
>>> 10;*
>>>
>>> *OK*
>>>
>>> * *
>>>
>>> And I always get zero result back but the data is there in that table
>>> for Aug 5th.**
>>>
>>>
>>>
>>> What wrong I am doing? Any suggestions will be appreciated.
>>>
>>>
>>>
>>>
>>>
>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable
>>> substitution thing, so I cannot use hiveconf here and the above table has
>>> been partitioned on dt(date) column.**
>>>
>>
>>
>
--
Swarnim