Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> UDTF fails when used in LATERAL VIEW


Copy link to this message
-
Re: UDTF fails when used in LATERAL VIEW
Hi Jan,
Yeah, you are right, initialize has to use the correct ObjectInspector.

Here is a blog post I am in process of writing on how to write a UDF:

http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html

And, the code it references is a UDF I wrote:
https://github.com/markgrover/hive-translate/blob/master/GenericUDFTranslate.java

It isn't directly related to your use case but it shows how to object inspectors correspond to various types. As you will see in the code, because I am returning Text object from evaluate(), initialize() is returning PrimitiveObjectInspectorFactory.writableStringObjectInspector

Glad you found the solution. Sorry that you learned it the hard way though!

Mark

----- Original Message -----
From: "Jan Dolinár" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, June 22, 2012 1:59:03 AM
Subject: Re: UDTF fails when used in LATERAL VIEW

Hi Mark,

Thanks for suggestion, it is not that naïve :) I tried a lot of things
and combinations, including Text and even LazyString (as I was getting
exceptions about converting String to LazyString at one moment...).

But I guess what I missed was correct setting of field object
inspectors in initialize(). Only today I found out the correct way to
do this is using WritableStringObjectInspector:

    fieldNames.add("section");
    fieldOIs.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);

I couldn't find it before, since I was looking for
TextObjectInspector, which obviously doesn't exist - silly me :)
Anyway, it doesn't fail this way, but things get even weirder.

The simple queries over table without my SerDe and InputFormat, as
well as the SELECT my_func() ... work well, but the LATERAL VIEW query
now returns 0 lines.

At the end of a task log there is following:

2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 forwarded 1294158 rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1294158
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:64918
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing...
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1229240
rows
2012-06-22 07:42:23,974 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 finished. closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 forwarded 2654579 rows
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 finished.
closing...
2012-06-22 07:42:23,975 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 forwarded 0 rows
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2012-06-22 07:42:24,067 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 6 Close done

So it looks like UDTF returns something but it dissapears in
FileSinkOperator. Or is this because the query was executed from hive
cli, so it is not writen to file, but streamed directly?

Also, I would like to ask what is the correct way to set the Text
value before forwarding. I've tried the following three ways:
        PrimitiveObjectInspectorFactory.writableStringObjectInspector.getPrimitiveWritableObject(forwardListObj[0]).set(output);

        ((Text)forwardListObj[0]).set(output);

        forwardListObj[0] = new Text(output);

All of them seem to work exactly the same. I know that the third could
cause performance problems, but I'm not sure which of the first two is
preferred.

Thank again for your assistance,

Jan

On 6/22/12, Mark Grover <[EMAIL PROTECTED]> wrote: