Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> slow performance when using udf


Copy link to this message
-
Re: slow performance when using udf
Thanks for all your advise, I'll try it out.

On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
>
>
> On Monday, August 15, 2011, Carl Steinbach <[EMAIL PROTECTED]> wrote:
>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
>> should help some with performance.
>> On Mon, Aug 15, 2011 at 1:49 AM, wd <[EMAIL PROTECTED]> wrote:
>>>
>>> hi,
>>>
>>> I create a udf to decode urlencoded things, but found the speed for
>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>>
>>> package com.test.hive.udf;
>>>
>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>> import java.net.URLDecoder;
>>>
>>> public final class urldecode extends UDF {
>>>
>>>    public String evaluate(final String s) {
>>>        if (s == null) { return null; }
>>>        return getString(s);
>>>    }
>>>
>>>    public static String getString(String s) {
>>>        String a;
>>>        try {
>>>            a = URLDecoder.decode(s);
>>>        } catch ( Exception e) {
>>>            a = "";
>>>        }
>>>        return a;
>>>    }
>>>
>>>    public static void main(String args[]) {
>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>        System.out.println( getString(t) );
>>>    }
>>> }
>>
>>
>
> Also you should use class level privatete members to save on object
> incantation and garbage collection.
>
> You also get benefits by matching the args with what you would normally
> expect from upstream. Hive converts text to string when needed, but if the
> data normally coming into the method is text you could try and match the
> argument and see if it is any faster.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB