|
|
-
Custom UserDefinedFunction in Hive
Raihan Jamal 2012-08-07, 01:39
*Problem*
I created the below UserDefinedFunction to get the yesterday's day in the format I wanted as I will be passing the format into this below method from the query.
*public final class YesterdayDate extends UDF {*
* *
* public String evaluate(final String format) { *
* DateFormat dateFormat = new SimpleDateFormat(format); *
* Calendar cal = Calendar.getInstance();*
* cal.add(Calendar.DATE, -1); *
* return dateFormat.format(cal.getTime()).toString(); *
* } *
*}*
So whenever I try to run the query like below by adding the jar to classpath and creating the temporary function yesterdaydate, I always get zero result back-
hive> create temporary function *yesterdaydate* as 'com.example.hive.udf.YesterdayDate';
OK
Time taken: 0.512 seconds
Below is the query I am running-
*hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;*
*OK*
* *
And I always get zero result back but the data is there in that table for Aug 5th.**
What wrong I am doing? Any suggestions will be appreciated.
NOTE:- As I am working with Hive 0.6 so it doesn’t support variable substitution thing, so I cannot use hiveconf here and the above table has been partitioned on dt(date) column.**
-
Re: Custom UserDefinedFunction in Hive
Jan Dolinár 2012-08-07, 05:56
Hi Jamal,
Check if the function really returns what it should and that your data are really in yyyyMMdd format. You can do this by simple query like this:
SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1;
I don't see anything wrong with the function itself, it works well for me (although I tested it in hive 0.7.1). The only thing I would change about it would be to optimize it by calling 'new' only at the time of construction and reusing the object when the function is called, but that should not affect the functionality at all.
Best regards, Jan On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <[EMAIL PROTECTED]> wrote:
> *Problem* > > I created the below UserDefinedFunction to get the yesterday's day in the > format I wanted as I will be passing the format into this below method from > the query. > > > > *public final class YesterdayDate extends UDF {* > > * * > > * public String evaluate(final String format) { * > > * DateFormat dateFormat = new > SimpleDateFormat(format); * > > * Calendar cal = Calendar.getInstance();* > > * cal.add(Calendar.DATE, -1); * > > * return > dateFormat.format(cal.getTime()).toString(); * > > * } * > > *}* > > > > > > So whenever I try to run the query like below by adding the jar to > classpath and creating the temporary function yesterdaydate, I always get > zero result back- > > > > hive> create temporary function *yesterdaydate* as > 'com.example.hive.udf.YesterdayDate'; > > OK > > Time taken: 0.512 seconds > > > > Below is the query I am running- > > > > *hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT > 10;* > > *OK* > > * * > > And I always get zero result back but the data is there in that table for > Aug 5th.** > > > > What wrong I am doing? Any suggestions will be appreciated. > > > > > > NOTE:- As I am working with Hive 0.6 so it doesn’t support variable > substitution thing, so I cannot use hiveconf here and the above table has > been partitioned on dt(date) column.** >
-
Re: Custom UserDefinedFunction in Hive
Raihan Jamal 2012-08-07, 06:18
I tested that function using main and by printing it out and it works fine. As I am trying to get the Yesterday's date.
I need my query to be like this as today's date is Aug 6th, so query should be for Aug 5th. And this works fine for me.
*SELECT * FROM REALTIME where dt= '20120805' LIMIT 10;*
So Instead of doing the above way, I wanted to do it like below- And the below query should give the same result as above query. And when I tried doing this way, I get zero result back.
*SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;*
So something is wrong the way I am doing it for sure?
*Raihan Jamal*
On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
> Hi Jamal, > > Check if the function really returns what it should and that your data are > really in yyyyMMdd format. You can do this by simple query like this: > > SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1; > > I don't see anything wrong with the function itself, it works well for me > (although I tested it in hive 0.7.1). The only thing I would change about > it would be to optimize it by calling 'new' only at the time of > construction and reusing the object when the function is called, but that > should not affect the functionality at all. > > Best regards, > Jan > > > > > On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <[EMAIL PROTECTED]>wrote: > >> *Problem* >> >> I created the below UserDefinedFunction to get the yesterday's day in the >> format I wanted as I will be passing the format into this below method from >> the query. >> >> >> >> *public final class YesterdayDate extends UDF {* >> >> * * >> >> * public String evaluate(final String format) { * >> >> * DateFormat dateFormat = new >> SimpleDateFormat(format); * >> >> * Calendar cal = Calendar.getInstance();* >> >> * cal.add(Calendar.DATE, -1); * >> >> * return >> dateFormat.format(cal.getTime()).toString(); * >> >> * } * >> >> *}* >> >> >> >> >> >> So whenever I try to run the query like below by adding the jar to >> classpath and creating the temporary function yesterdaydate, I always get >> zero result back- >> >> >> >> hive> create temporary function *yesterdaydate* as >> 'com.example.hive.udf.YesterdayDate'; >> >> OK >> >> Time taken: 0.512 seconds >> >> >> >> Below is the query I am running- >> >> >> >> *hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT >> 10;* >> >> *OK* >> >> * * >> >> And I always get zero result back but the data is there in that table for >> Aug 5th.** >> >> >> >> What wrong I am doing? Any suggestions will be appreciated. >> >> >> >> >> >> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable >> substitution thing, so I cannot use hiveconf here and the above table has >> been partitioned on dt(date) column.** >> > >
-
Re: Custom UserDefinedFunction in Hive
Raihan Jamal 2012-08-07, 17:20
Hi Jan, I figured that out, it is working fine for me now. The only question I have is, if I am doing like this-
SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
Then the above query will be evaluated as below right?
SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
So that means it will look for data in the corresponding dt partition *(20120806) *only right as above table is partitioned on dt column ? And it will not scan the whole table right?**
*Raihan Jamal*
On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
> Hi Jamal, > > Check if the function really returns what it should and that your data are > really in yyyyMMdd format. You can do this by simple query like this: > > SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1; > > I don't see anything wrong with the function itself, it works well for me > (although I tested it in hive 0.7.1). The only thing I would change about > it would be to optimize it by calling 'new' only at the time of > construction and reusing the object when the function is called, but that > should not affect the functionality at all. > > Best regards, > Jan > > > > > On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <[EMAIL PROTECTED]>wrote: > >> *Problem* >> >> I created the below UserDefinedFunction to get the yesterday's day in the >> format I wanted as I will be passing the format into this below method from >> the query. >> >> >> >> *public final class YesterdayDate extends UDF {* >> >> * * >> >> * public String evaluate(final String format) { * >> >> * DateFormat dateFormat = new >> SimpleDateFormat(format); * >> >> * Calendar cal = Calendar.getInstance();* >> >> * cal.add(Calendar.DATE, -1); * >> >> * return >> dateFormat.format(cal.getTime()).toString(); * >> >> * } * >> >> *}* >> >> >> >> >> >> So whenever I try to run the query like below by adding the jar to >> classpath and creating the temporary function yesterdaydate, I always get >> zero result back- >> >> >> >> hive> create temporary function *yesterdaydate* as >> 'com.example.hive.udf.YesterdayDate'; >> >> OK >> >> Time taken: 0.512 seconds >> >> >> >> Below is the query I am running- >> >> >> >> *hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT >> 10;* >> >> *OK* >> >> * * >> >> And I always get zero result back but the data is there in that table for >> Aug 5th.** >> >> >> >> What wrong I am doing? Any suggestions will be appreciated. >> >> >> >> >> >> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable >> substitution thing, so I cannot use hiveconf here and the above table has >> been partitioned on dt(date) column.** >> > >
|
|