Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Re: Custom UserDefinedFunction in Hive


+
Jan Dolinár 2012-08-07, 17:37
+
Raihan Jamal 2012-08-07, 18:07
+
Vijay 2012-08-07, 18:11
+
Raihan Jamal 2012-08-07, 18:26
+
Vijay 2012-08-07, 18:31
+
Jan Dolinár 2012-08-07, 18:18
+
Raihan Jamal 2012-08-07, 18:24
+
Jan Dolinár 2012-08-07, 18:32
+
Techy Teck 2012-08-07, 18:39
Copy link to this message
-
Re: Custom UserDefinedFunction in Hive
Right, no need for the function at all. Sorry it is getting late here and
my brain refuses to work any more :)
On Tue, Aug 7, 2012 at 8:39 PM, Techy Teck <[EMAIL PROTECTED]> wrote:

> Then that means I don't need to create that userdefinedfunction right?
>
>
>
> On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
>
>> Hi Jamal,
>>
>> date is standard linux/unix tool, see the manual page:
>> http://linux.die.net/man/1/date.
>>
>> The $(...) tells the shell to execute the command and insert it's output
>> into the string. So in this case it will execute command
>> date -d -1day +%Y%m%d
>> which returns yesterday date in the format you need.
>>
>> Jan
>>
>>
>> On Tue, Aug 7, 2012 at 8:24 PM, Raihan Jamal <[EMAIL PROTECTED]>wrote:
>>
>>> Yes it supports -e option, but in your query what is date?
>>>
>>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
>>> AS 'com.example.hive.udf.YesterdayDate';
>>> SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;"
>>>
>>>
>>>
>>> *Raihan Jamal*
>>>
>>>
>>>
>>> On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <[EMAIL PROTECTED]>wrote:
>>>
>>>> By the way, even without hiveconf, you can run hive from shell like
>>>> this to achieve what you want using shell capabilities:
>>>>
>>>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
>>>> AS 'com.example.hive.udf.YesterdayDate';
>>>> SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;"
>>>>
>>>> At least if hive 6.0 supports -e option, I don't have where to check
>>>> that.
>>>>
>>>> Jan
>>>>
>>>>
>>>> On Tue, Aug 7, 2012 at 8:11 PM, Vijay <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Given the implementation of the UDF, I don't think hive would be able
>>>>> to use partition pruning. Especially the version you're using. I'd
>>>>> really recommend upgrading to a later version that has the hiveconf
>>>>> support. That can save a lot of trouble rather than trying to get
>>>>> things working on 0.6
>>>>>
>>>>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>> > Hi Jan,
>>>>> >
>>>>> > I have date in different format also, so that is the reason I was
>>>>> thinking
>>>>> > to do by this approach. How can I make sure this will work on the
>>>>> selected
>>>>> > partition only and it will not scan the entire table. I will add your
>>>>> > suggestion in my UDF as deterministic thing.
>>>>> >
>>>>> > My simple question here is- How to get the Yesterdays date which I
>>>>> can use
>>>>> > on the Date Partition I cannot use hiveconf here as I am working
>>>>> with Hive
>>>>> > 0.6
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Raihan Jamal
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>> >>
>>>>> >> I'm afraid that  he query
>>>>> >>
>>>>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>>>> >>
>>>>> >> will scan entire table, because the functions is evaluated at
>>>>> runtime, so
>>>>> >> Hive doesn't know what the value is when it decides which files to
>>>>> scan. I
>>>>> >> am not 100% sure though, you should try it.
>>>>> >>
>>>>> >> Also, you might want to try to add annotation to your UDF saying
>>>>> that the
>>>>> >> function is deterministic:
>>>>> >> @UDFType(deterministic=false)
>>>>> >>
>>>>> >> I think Hive might be clever enough to evaluate it early enough to
>>>>> use the
>>>>> >> partition pruning correctly, since it operates on constant
>>>>> expression. But
>>>>> >> again, I'm not really sure, maybe someone with deeper knowledge of
>>>>> Hive
>>>>> >> optimizations will tell us more. It is actually quite interesting
>>>>> question.
>>>>> >>
>>>>> >> Another way to help Hive with the optimizations might be to skip
>>>>> passing
>>>>> >> the format string argument, if you have all dates in same format,
>>>>> you can
>>>>> >> call the function just like 'yesterdaydate()' and hardcode the
>>>>> format in the
>>>>> >> function.
>>>>> >>
>>>>> >> Jan
>>>>> >>
>>>>> >>
+
Raihan Jamal 2012-08-07, 18:41
+
Jan Dolinár 2012-08-07, 17:56
+
Raihan Jamal 2012-08-07, 18:03
+
kulkarni.swarnim@...) 2012-08-07, 17:35
+
Bejoy Ks 2012-08-08, 10:49
+
Venkatesh Kavuluri 2012-08-08, 23:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB