Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - handling null argument in custom udf


+
Søren 2012-12-04, 13:31
+
Edward Capriolo 2012-12-04, 14:43
+
Søren 2012-12-04, 14:58
Copy link to this message
-
Re: handling null argument in custom udf
Mark Grover 2012-12-05, 03:31
Soren,
Can you give the complete stack trace? Or share the code? Perhaps, the
skeletal code.
Look at Ceil UDF for example, it has a null check, you should be able to do
something similar:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCeil.java#L43

I would encourage you in the long run to use GenericUDF though. They are
better performing because they don't use reflection. I wrote a blog post a
while back to get people started with UDFs. It's at:
http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html

Perhaps, I should put the content on Apache wiki but in the meanwhile, take
a look at it...

Using the Translate UDF as an example(reference:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java
)
If you would like to have a column accept nulls:
1. Allow the argument type to be "void" type in initialize() like it's done
at
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java#L151
2. Handle null values appropriately in evaluate() like it's done at
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java#L172

Good luck!
Mark

On Tue, Dec 4, 2012 at 6:58 AM, Søren <[EMAIL PROTECTED]> wrote:

>  Thanks. Did you mean I should handle null in my udf or my serde?
>
> I did try to check for null inside the code in my udf, but it fails even
> before it gets called.
>
> This is from when the udf fails:
> ....
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
> execute method public org.apache.hadoop.io.Text
> com.company.hive.myfun.evaluate(java.lang.Object,java.lang.Object)
> on objectcom.company.hive.myfun@1412332 of class com.company.hive.myfun with
> arguments {0:java.lang.Object, null} of size 2
>
> It looks like there is a null, or is this error message misleading?
>
>
>
> On 04/12/2012 15:43, Edward Capriolo wrote:
>
> There is no null argument. You should handle the null case in your code.
>
> If (arga == null)
>
> Or optionally you could use a generic udf but a regular one should handle
> what you are doing.
>
> On Tuesday, December 4, 2012, Søren <[EMAIL PROTECTED]> wrote:
> > Hi Hive community
> >
> > I have a custom udf, say myfun, written in Java which I utilize like this
> >
> > select myfun(col_a, col_b) from mytable where ....etc
> >
> > col_b is a string type and sometimes it is null.
> >
> > When that happens, my query crashes with
> > ---------------
> > java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row
> > {"col_a":"val","col_b":null}
> > ...
> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
> execute method public org.apache.hadoop.io.Text
> > ---------------
> >
> > public final class myfun extends UDF {
> >         public Text evaluate(final Text argA, final Text argB) {
> >
> > I'm unsure how this should be fixed in a proper way. Is the framework
> looking for an overload of evaluate that would comply with the null
> argument?
> >
> > I need to say that the table is declared using my own json serde reading
> from S3. I'm not processing nulls in my serde in any special way because
> Hive seems to handle null in the right way when not passed to my own UDF.
> >
> > Are there anyone out there with ideas or experiences on this issue?
> >
> > thanks in advance
> > Søren
> >
> >
>
>
>
+
Vivek Mishra 2012-12-05, 10:06
+
Vivek Mishra 2012-12-05, 10:10
+
Søren 2012-12-06, 10:43