|
|
-
slow performance when using udf
wd 2011-08-15, 08:49
hi,
I create a udf to decode urlencoded things, but found the speed for mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
package com.test.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF; import java.net.URLDecoder;
public final class urldecode extends UDF {
public String evaluate(final String s) { if (s == null) { return null; } return getString(s); }
public static String getString(String s) { String a; try { a = URLDecoder.decode(s); } catch ( Exception e) { a = ""; } return a; }
public static void main(String args[]) { String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; System.out.println( getString(t) ); } }
-
Re: slow performance when using udf
Carl Steinbach 2011-08-15, 09:22
Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) should help some with performance.
On Mon, Aug 15, 2011 at 1:49 AM, wd <[EMAIL PROTECTED]> wrote:
> hi, > > I create a udf to decode urlencoded things, but found the speed for > mapred is 3 times(73sec -> 213 sec) as before. How to optimize it? > > package com.test.hive.udf; > > import org.apache.hadoop.hive.ql.exec.UDF; > import java.net.URLDecoder; > > public final class urldecode extends UDF { > > public String evaluate(final String s) { > if (s == null) { return null; } > return getString(s); > } > > public static String getString(String s) { > String a; > try { > a = URLDecoder.decode(s); > } catch ( Exception e) { > a = ""; > } > return a; > } > > public static void main(String args[]) { > String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; > System.out.println( getString(t) ); > } > } >
-
Re: slow performance when using udf
Edward Capriolo 2011-08-15, 13:02
On Monday, August 15, 2011, Carl Steinbach <[EMAIL PROTECTED]> wrote: > Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) should help some with performance. > On Mon, Aug 15, 2011 at 1:49 AM, wd <[EMAIL PROTECTED]> wrote: >> >> hi, >> >> I create a udf to decode urlencoded things, but found the speed for >> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it? >> >> package com.test.hive.udf; >> >> import org.apache.hadoop.hive.ql.exec.UDF; >> import java.net.URLDecoder; >> >> public final class urldecode extends UDF { >> >> public String evaluate(final String s) { >> if (s == null) { return null; } >> return getString(s); >> } >> >> public static String getString(String s) { >> String a; >> try { >> a = URLDecoder.decode(s); >> } catch ( Exception e) { >> a = ""; >> } >> return a; >> } >> >> public static void main(String args[]) { >> String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; >> System.out.println( getString(t) ); >> } >> } > >
Also you should use class level privatete members to save on object incantation and garbage collection.
You also get benefits by matching the args with what you would normally expect from upstream. Hive converts text to string when needed, but if the data normally coming into the method is text you could try and match the argument and see if it is any faster.
-
Re: slow performance when using udf
wd 2011-08-16, 02:47
Thanks for all your advise, I'll try it out.
On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > > > On Monday, August 15, 2011, Carl Steinbach <[EMAIL PROTECTED]> wrote: >> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) >> should help some with performance. >> On Mon, Aug 15, 2011 at 1:49 AM, wd <[EMAIL PROTECTED]> wrote: >>> >>> hi, >>> >>> I create a udf to decode urlencoded things, but found the speed for >>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it? >>> >>> package com.test.hive.udf; >>> >>> import org.apache.hadoop.hive.ql.exec.UDF; >>> import java.net.URLDecoder; >>> >>> public final class urldecode extends UDF { >>> >>> public String evaluate(final String s) { >>> if (s == null) { return null; } >>> return getString(s); >>> } >>> >>> public static String getString(String s) { >>> String a; >>> try { >>> a = URLDecoder.decode(s); >>> } catch ( Exception e) { >>> a = ""; >>> } >>> return a; >>> } >>> >>> public static void main(String args[]) { >>> String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; >>> System.out.println( getString(t) ); >>> } >>> } >> >> > > Also you should use class level privatete members to save on object > incantation and garbage collection. > > You also get benefits by matching the args with what you would normally > expect from upstream. Hive converts text to string when needed, but if the > data normally coming into the method is text you could try and match the > argument and see if it is any faster.
-
Re: slow performance when using udf
wd 2011-08-16, 06:33
Finally, the flowing code get no performance lose. I think the point is to avoid to use the getString method, Thanks everyone again.
//import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text;
import java.net.URLDecoder;
public final class urldecode extends UDF {
private Text t = new Text();
public Text evaluate(Text s) { if (s == null) { return null; } try { t.set( URLDecoder.decode( s.toString(), "UTF-8" )); return t; } catch ( Exception e) { return null; } }
//public static void main(String args[]) { //String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; //System.out.println( getString(t) ); //} } On Tue, Aug 16, 2011 at 10:47 AM, wd <[EMAIL PROTECTED]> wrote: > Thanks for all your advise, I'll try it out. > > On Mon, Aug 15, 2011 at 9:02 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote: >> >> >> On Monday, August 15, 2011, Carl Steinbach <[EMAIL PROTECTED]> wrote: >>> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF) >>> should help some with performance. >>> On Mon, Aug 15, 2011 at 1:49 AM, wd <[EMAIL PROTECTED]> wrote: >>>> >>>> hi, >>>> >>>> I create a udf to decode urlencoded things, but found the speed for >>>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it? >>>> >>>> package com.test.hive.udf; >>>> >>>> import org.apache.hadoop.hive.ql.exec.UDF; >>>> import java.net.URLDecoder; >>>> >>>> public final class urldecode extends UDF { >>>> >>>> public String evaluate(final String s) { >>>> if (s == null) { return null; } >>>> return getString(s); >>>> } >>>> >>>> public static String getString(String s) { >>>> String a; >>>> try { >>>> a = URLDecoder.decode(s); >>>> } catch ( Exception e) { >>>> a = ""; >>>> } >>>> return a; >>>> } >>>> >>>> public static void main(String args[]) { >>>> String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A"; >>>> System.out.println( getString(t) ); >>>> } >>>> } >>> >>> >> >> Also you should use class level privatete members to save on object >> incantation and garbage collection. >> >> You also get benefits by matching the args with what you would normally >> expect from upstream. Hive converts text to string when needed, but if the >> data normally coming into the method is text you could try and match the >> argument and see if it is any faster. >
|
|