|
|
-
Re: UDTF fails when used in LATERAL VIEWMark Grover 2012-06-22, 13:53
Hi Jan,
Yeah, you are right, initialize has to use the correct ObjectInspector. Here is a blog post I am in process of writing on how to write a UDF: http://mark.thegrovers.ca/1/post/2012/06/how-to-write-a-hive-udf.html And, the code it references is a UDF I wrote: https://github.com/markgrover/hive-translate/blob/master/GenericUDFTranslate.java It isn't directly related to your use case but it shows how to object inspectors correspond to various types. As you will see in the code, because I am returning Text object from evaluate(), initialize() is returning PrimitiveObjectInspectorFactory.writableStringObjectInspector Glad you found the solution. Sorry that you learned it the hard way though! Mark ----- Original Message ----- From: "Jan Dolinár" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Friday, June 22, 2012 1:59:03 AM Subject: Re: UDTF fails when used in LATERAL VIEW Hi Mark, Thanks for suggestion, it is not that naïve :) I tried a lot of things and combinations, including Text and even LazyString (as I was getting exceptions about converting String to LazyString at one moment...). But I guess what I missed was correct setting of field object inspectors in initialize(). Only today I found out the correct way to do this is using WritableStringObjectInspector: fieldNames.add("section"); fieldOIs.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); I couldn't find it before, since I was looking for TextObjectInspector, which obviously doesn't exist - silly me :) Anyway, it doesn't fail this way, but things get even weirder. The simple queries over table without my SerDe and InputFormat, as well as the SELECT my_func() ... work well, but the LATERAL VIEW query now returns 0 lines. At the end of a task log there is following: 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6 finished. closing... 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6 forwarded 1294158 rows 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 1294158 rows 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 finished. closing... 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 forwarded 1229240 rows 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:64918 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing... 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 1229240 rows 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:1229240 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:0 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 forwarded 1229240 rows 2012-06-22 07:42:23,974 INFO org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 finished. closing... 2012-06-22 07:42:23,975 INFO org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 forwarded 2654579 rows 2012-06-22 07:42:23,975 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 finished. closing... 2012-06-22 07:42:23,975 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 5 forwarded 0 rows 2012-06-22 07:42:24,067 INFO org.apache.hadoop.hive.ql.exec.UDTFOperator: 4 Close done 2012-06-22 07:42:24,067 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 3 Close done 2012-06-22 07:42:24,067 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done 2012-06-22 07:42:24,067 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done 2012-06-22 07:42:24,067 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2012-06-22 07:42:24,067 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 6 Close done So it looks like UDTF returns something but it dissapears in FileSinkOperator. Or is this because the query was executed from hive cli, so it is not writen to file, but streamed directly? Also, I would like to ask what is the correct way to set the Text value before forwarding. I've tried the following three ways: PrimitiveObjectInspectorFactory.writableStringObjectInspector.getPrimitiveWritableObject(forwardListObj[0]).set(output); ((Text)forwardListObj[0]).set(output); forwardListObj[0] = new Text(output); All of them seem to work exactly the same. I know that the third could cause performance problems, but I'm not sure which of the first two is preferred. Thank again for your assistance, Jan On 6/22/12, Mark Grover <[EMAIL PROTECTED]> wrote: |