Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> slow performance when using udf


Copy link to this message
-
Re: slow performance when using udf
Finally, the flowing code get no performance lose. I think the point
is to avoid to use the getString method, Thanks everyone again.

//import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

import java.net.URLDecoder;

public final class urldecode extends UDF {

    private Text t = new Text();

    public Text evaluate(Text s) {
        if (s == null) { return null; }
        try {
            t.set( URLDecoder.decode( s.toString(), "UTF-8" ));
            return t;
        } catch ( Exception e) {
            return null;
        }
    }

    //public static void main(String args[]) {
        //String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
        //System.out.println( getString(t) );
    //}
}
On Tue, Aug 16, 2011 at 10:47 AM, wd <[EMAIL PROTECTED]> wrote:
> Thanks for all your advise, I'll try it out.
>
> On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
>>
>>
>> On Monday, August 15, 2011, Carl Steinbach <[EMAIL PROTECTED]> wrote:
>>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
>>> should help some with performance.
>>> On Mon, Aug 15, 2011 at 1:49 AM, wd <[EMAIL PROTECTED]> wrote:
>>>>
>>>> hi,
>>>>
>>>> I create a udf to decode urlencoded things, but found the speed for
>>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>>>
>>>> package com.test.hive.udf;
>>>>
>>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>>> import java.net.URLDecoder;
>>>>
>>>> public final class urldecode extends UDF {
>>>>
>>>>    public String evaluate(final String s) {
>>>>        if (s == null) { return null; }
>>>>        return getString(s);
>>>>    }
>>>>
>>>>    public static String getString(String s) {
>>>>        String a;
>>>>        try {
>>>>            a = URLDecoder.decode(s);
>>>>        } catch ( Exception e) {
>>>>            a = "";
>>>>        }
>>>>        return a;
>>>>    }
>>>>
>>>>    public static void main(String args[]) {
>>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>>        System.out.println( getString(t) );
>>>>    }
>>>> }
>>>
>>>
>>
>> Also you should use class level privatete members to save on object
>> incantation and garbage collection.
>>
>> You also get benefits by matching the args with what you would normally
>> expect from upstream. Hive converts text to string when needed, but if the
>> data normally coming into the method is text you could try and match the
>> argument and see if it is any faster.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB