Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> UDTF fails when used in LATERAL VIEW


Copy link to this message
-
Re: UDTF fails when used in LATERAL VIEW
Hi Jan,
Here's my first naïve question:-)

Have you tried returning a Text value instead of String? Atleast in the case of UDFs, returning Text instead of Strings is possible and recommended too. I would think it would be the same case with UDTFs.

Mark

----- Original Message -----
From: "Jan Dolinár" <[EMAIL PROTECTED]>
To: "user" <[EMAIL PROTECTED]>
Sent: Thursday, June 21, 2012 8:02:20 AM
Subject: UDTF fails when used in LATERAL VIEW

Hi,

I've hit problems when writing custom UDTF that should return string
values. I couldn't find anywhere what type should have the values that
get forward()ed to collector. The only info I could dig out from
google was few blogs with examples and 4 UDTFs that are among the hive
sources. From that I figured out, that it should be OK to simply pass
Strings inside the forwarded Object[] array. Here are the relevant
parts of my code:

      private Object[] forwardListObj;

      @Override
      public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {

        // snipped irrelevant code

        forwardListObj = new Object[1];
        forwardListObj[0] = new String();

        ArrayList<String> fieldNames = new ArrayList<String>(1);
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(1);

        fieldNames.add("section");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);
      }

In proces() there is simple forwarding of some String:

      forwardListObj[0] = "";
      forward(forwardListObj);
      // OR
      String s = ...
      forwardListObj[0] = s;
      forward(forwardListObj);
I was testing the function with a simple query

SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104);

and it worked just as intended. But at the moment I got from testing
to actually using the function in more complex queries, I got into
trouble. Even LATERAL VIEW statement can cause failures:

SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104);

causes tasks to fail with exception

java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86)
at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103)
...

I should also mention that I use custom SerDe and InputFormat for the
'logs' table. When I was trying to figure it out, I was trying to run
the same queries as listed above on different table without the
customizations and it worked correctly too. So I think the SerDe
and/or InputFormat probably play some role in this as well. What I
don't understand is why the problem exhibits itself only with LATERAL
VIEW. Any ideas anyone? Also, is it really correct to send String in
forward()?

Best regards,
Jan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB