Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - A bug belongs to Hive or Elephant-bird


Copy link to this message
-
A bug belongs to Hive or Elephant-bird
java8964 java8964 2013-03-08, 19:45

Hi,
Hive 0.9.0 + Elephant-Bird 3.0.7
I faced a problem to use the elephant-bird with hive. I know what maybe cause this problem, but I don't know which side this bug belongs to. Let me know explain what is the problem.
If we define a google protobuf file, with field name like 'dateString' (the field contains an uppercase 'S'), then when I query the table like this:
select dateString from table .............

I will get the following exception trace:
Caused by:
java.lang.RuntimeException: cannot find field datestring from
[org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@49aacd5f .....................        at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)

      
at
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:96)

      
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)

      
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)

      
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)

      
at
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

      
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

      
at
org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:73)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

      
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

      
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)

      
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

      
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)

Here is the code for the method throws this error:
  public static StructField getStandardStructFieldRef(String fieldName,      List<? extends StructField> fields) {    fieldName = fieldName.toLowerCase();    for (int i = 0; i < fields.size(); i++) {      if (fields.get(i).getFieldName().equals(fieldName)) {        return fields.get(i);      }    }    // For backward compatibility: fieldNames can also be integer Strings.    try {      int i = Integer.parseInt(fieldName);      if (i >= 0 && i < fields.size()) {        return fields.get(i);      }    } catch (NumberFormatException e) {      // ignore    }    throw new RuntimeException("cannot find field " + fieldName + " from "        + fields);    // return null;  }
I understand the problem happens because at this time, the fileName is "datestring" (all lowercase charcters), but the List<fields> contains the fieldName for that field is "dateString", and that is why the RuntimeException happened.
But I don't know which side this bug belongs to, or I want to know more inside detail about the Hive implementation contract.
>From this link: https://cwiki.apache.org/Hive/user-faq.html#UserFAQ-AreHiveQLidentifiers%2528e.g.tablenames%252Ccolumnnames%252Cetc%2529casesensitive%253F
I know that in hive, the table name and column name should be case insensitive, so even though in my Query, I used "select dateString", the fieldName changed to "datestring" in the code, but the StructField of ObjectInspector from the elephant-bird return the EXACTLY fieldname, defined in the code, "dateString" in this case. of course, I can change my protof file to only use lowercase field name to bypass this bug, but my questions are:
1) If I implement my ObjectInspector, should I pay attention to the field name? Is it needed to be lowercase? 2) I would consider this as a bug of hive, right? If this line:
fieldName = fieldName.toLowerCase(); to lowercase the data,
then the comparing should also do it by lowering case by changing
if (fields.get(i).getFieldName().equals(fieldName))
to
if (fields.get(i).getFieldName().toLowerCase().equals(fieldName))
right?
Thanks
Yong