Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> slow performance when using udf


Copy link to this message
-
Re: slow performance when using udf
Thanks for all your advise, I'll try it out.

On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
>
>
> On Monday, August 15, 2011, Carl Steinbach <[EMAIL PROTECTED]> wrote:
>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
>> should help some with performance.
>> On Mon, Aug 15, 2011 at 1:49 AM, wd <[EMAIL PROTECTED]> wrote:
>>>
>>> hi,
>>>
>>> I create a udf to decode urlencoded things, but found the speed for
>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>>
>>> package com.test.hive.udf;
>>>
>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>> import java.net.URLDecoder;
>>>
>>> public final class urldecode extends UDF {
>>>
>>>    public String evaluate(final String s) {
>>>        if (s == null) { return null; }
>>>        return getString(s);
>>>    }
>>>
>>>    public static String getString(String s) {
>>>        String a;
>>>        try {
>>>            a = URLDecoder.decode(s);
>>>        } catch ( Exception e) {
>>>            a = "";
>>>        }
>>>        return a;
>>>    }
>>>
>>>    public static void main(String args[]) {
>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>        System.out.println( getString(t) );
>>>    }
>>> }
>>
>>
>
> Also you should use class level privatete members to save on object
> incantation and garbage collection.
>
> You also get benefits by matching the args with what you would normally
> expect from upstream. Hive converts text to string when needed, but if the
> data normally coming into the method is text you could try and match the
> argument and see if it is any faster.