Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Re: Custom UserDefinedFunction in Hive


+
Jan Dolinár 2012-08-07, 17:37
+
Raihan Jamal 2012-08-07, 18:07
+
Vijay 2012-08-07, 18:11
+
Raihan Jamal 2012-08-07, 18:26
Copy link to this message
-
Re: Custom UserDefinedFunction in Hive
You actually don't need hive on the whole cluster. That's the beauty
of it. You only need it on the client machine where you're submitting
hive jobs. Of course the metadata store does need to be upgraded for
newer versions so that might still be a problem.

On Tue, Aug 7, 2012 at 11:26 AM, Raihan Jamal <[EMAIL PROTECTED]> wrote:
> Hi Vijay,
>
> Thanks for the suggestion,  If upgrading to Hive was under my control then I
> would have done for sure, but I am working in a company and they are running
> Hive 0.6 on all the cluster, And I told them to upgrade the Hive version but
> they said it will take few months for them to do this. And I don't know why
> they are saying like this, so that is the reason I was doing like this.
>
> Any suggestions will be appreciated to make this thing work
>
>
> Raihan Jamal
>
>
>
> On Tue, Aug 7, 2012 at 11:11 AM, Vijay <[EMAIL PROTECTED]> wrote:
>>
>> Given the implementation of the UDF, I don't think hive would be able
>> to use partition pruning. Especially the version you're using. I'd
>> really recommend upgrading to a later version that has the hiveconf
>> support. That can save a lot of trouble rather than trying to get
>> things working on 0.6
>>
>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <[EMAIL PROTECTED]>
>> wrote:
>> > Hi Jan,
>> >
>> > I have date in different format also, so that is the reason I was
>> > thinking
>> > to do by this approach. How can I make sure this will work on the
>> > selected
>> > partition only and it will not scan the entire table. I will add your
>> > suggestion in my UDF as deterministic thing.
>> >
>> > My simple question here is- How to get the Yesterdays date which I can
>> > use
>> > on the Date Partition I cannot use hiveconf here as I am working with
>> > Hive
>> > 0.6
>> >
>> >
>> >
>> >
>> > Raihan Jamal
>> >
>> >
>> >
>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <[EMAIL PROTECTED]>
>> > wrote:
>> >>
>> >> I'm afraid that  he query
>> >>
>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>> >>
>> >> will scan entire table, because the functions is evaluated at runtime,
>> >> so
>> >> Hive doesn't know what the value is when it decides which files to
>> >> scan. I
>> >> am not 100% sure though, you should try it.
>> >>
>> >> Also, you might want to try to add annotation to your UDF saying that
>> >> the
>> >> function is deterministic:
>> >> @UDFType(deterministic=false)
>> >>
>> >> I think Hive might be clever enough to evaluate it early enough to use
>> >> the
>> >> partition pruning correctly, since it operates on constant expression.
>> >> But
>> >> again, I'm not really sure, maybe someone with deeper knowledge of Hive
>> >> optimizations will tell us more. It is actually quite interesting
>> >> question.
>> >>
>> >> Another way to help Hive with the optimizations might be to skip
>> >> passing
>> >> the format string argument, if you have all dates in same format, you
>> >> can
>> >> call the function just like 'yesterdaydate()' and hardcode the format
>> >> in the
>> >> function.
>> >>
>> >> Jan
>> >>
>> >>
>> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <[EMAIL PROTECTED]>
>> >> wrote:
>> >>>
>> >>> Hi Jan,
>> >>>
>> >>>
>> >>>
>> >>> I figured that out, it is working fine for me now. The only question I
>> >>> have is, if I am doing like this-
>> >>>
>> >>>
>> >>>
>> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>> >>>
>> >>>
>> >>>
>> >>> Then the above query will be evaluated as below right?
>> >>>
>> >>>
>> >>>
>> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>> >>>
>> >>>
>> >>>
>> >>> So that means it will look for data in the corresponding dt partition
>> >>> (20120806) only right as above table is partitioned on dt column ? And
>> >>> it
>> >>> will not scan the whole table right?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Raihan Jamal
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <[EMAIL PROTECTED]>
>> >>> wrote:
>> >>>>
>>
+
Jan Dolinár 2012-08-07, 18:18
+
Raihan Jamal 2012-08-07, 18:24
+
Jan Dolinár 2012-08-07, 18:32
+
Techy Teck 2012-08-07, 18:39
+
Jan Dolinár 2012-08-07, 18:42
+
Raihan Jamal 2012-08-07, 18:41
+
Jan Dolinár 2012-08-07, 17:56
+
Raihan Jamal 2012-08-07, 18:03
+
kulkarni.swarnim@...) 2012-08-07, 17:35
+
Bejoy Ks 2012-08-08, 10:49
+
Venkatesh Kavuluri 2012-08-08, 23:41