Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> A bug belongs to Hive or Elephant-bird


Copy link to this message
-
A bug belongs to Hive or Elephant-bird

Hi,
Hive 0.9.0 + Elephant-Bird 3.0.7
I faced a problem to use the elephant-bird with hive. I know what maybe cause this problem, but I don't know which side this bug belongs to. Let me know explain what is the problem.
If we define a google protobuf file, with field name like 'dateString' (the field contains an uppercase 'S'), then when I query the table like this:
select dateString from table .............

I will get the following exception trace:
Caused by:
java.lang.RuntimeException: cannot find field datestring from
[org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@49aacd5f .....................        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)

      
at
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:96)

      
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)

      
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)

      
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)

      
at
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

      
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

      
at
org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:73)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

      
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

      
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)

Here is the code for the method throws this error:
  public static StructField getStandardStructFieldRef(String fieldName,      List<? extends StructField> fields) {    fieldName = fieldName.toLowerCase();    for (int i = 0; i < fields.size(); i++) {      if (fields.get(i).getFieldName().equals(fieldName)) {        return fields.get(i);      }    }    // For backward compatibility: fieldNames can also be integer Strings.    try {      int i = Integer.parseInt(fieldName);      if (i >= 0 && i < fields.size()) {        return fields.get(i);      }    } catch (NumberFormatException e) {      // ignore    }    throw new RuntimeException("cannot find field " + fieldName + " from "        + fields);    // return null;  }
I understand the problem happens because at this time, the fileName is "datestring" (all lowercase charcters), but the List<fields> contains the fieldName for that field is "dateString", and that is why the RuntimeException happened.
But I don't know which side this bug belongs to, or I want to know more inside detail about the Hive implementation contract.
>From this link: https://cwiki.apache.org/Hive/user-faq.html#UserFAQ-AreHiveQLidentifiers%2528e.g.tablenames%252Ccolumnnames%252Cetc%2529casesensitive%253F
I know that in hive, the table name and column name should be case insensitive, so even though in my Query, I used "select dateString", the fieldName changed to "datestring" in the code, but the StructField of ObjectInspector from the elephant-bird return the EXACTLY fieldname, defined in the code, "dateString" in this case. of course, I can change my protof file to only use lowercase field name to bypass this bug, but my questions are:
1) If I implement my ObjectInspector, should I pay attention to the field name? Is it needed to be lowercase? 2) I would consider this as a bug of hive, right? If this line:
fieldName = fieldName.toLowerCase(); to lowercase the data,
then the comparing should also do it by lowering case by changing
if (fields.get(i).getFieldName().equals(fieldName))
to
if (fields.get(i).getFieldName().toLowerCase().equals(fieldName))
right?
Thanks
Yong
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB