Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive 0.11.0 | Issue with ORC Tables


Copy link to this message
-
Re: Hive 0.11.0 | Issue with ORC Tables
On Thu, Sep 19, 2013 at 5:04 AM, Savant, Keshav <
[EMAIL PROTECTED]> wrote:

>  Hi All,****
>
> ** **
>
> We have setup apache “hive 0.11.0” services on Hadoop cluster (apache
> version 0.20.203.0). Hive is showing expected results when tables are
> stored as * TextFile*. ****
>
> Whereas, Hive 0.11.0’s new feature ORC(*Optimized Row Columnar*) is
> throwing an exception while running a select query, when we run select
> queries on tables stored as “*ORC*”.****
>
> Stacktrace of the exception :****
>
> ** **
>
> 2013-09-19 20:33:38,095 ERROR CliDriver
> (SessionState.java:printError(386)) - Failed with exception
> java.io.IOException:com.google.protobuf.InvalidProtocolBufferException:
> While parsing a protocol message, the input ended unexpectedly in the
> middle of a field.  This could mean either than the input has been
> truncated or that an embedded message misreported its own length.****
>
> java.io.IOException: com.google.protobuf.InvalidProtocolBufferException:
> While parsing a protocol message, the input ended unexpectedly in the
> middle of a field.  This could mean either than the input has been
> truncated or that an embedded message misreported its own length.****
>
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
> ****
>
>         at
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
> ****
>
>         at
> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)****
>
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)**
> **
>
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)**
> **
>
>         at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)****
>
>         at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)****
>
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)***
> *
>
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)**
> **
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)****
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> ****
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> ****
>
>         at java.lang.reflect.Method.invoke(Method.java:597)****
>
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)****
>
> Caused by: com.google.protobuf.InvalidProtocolBufferException: While
> parsing a protocol message, the input ended unexpectedly in the middle of a
> field.  This could mean either than the input has been truncated or that an
> embedded message misreported its own length.****
>
>         at
> com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:49)
> ****
>
>         at
> com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:754)
> ****
>
>         at
> com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:294)*
> ***
>
>         at
> com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484)
> ****
>
>         at
> com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:10129)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:9993)
> ****
>
>         at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:9970)
> ****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:193)***
> *
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:56)****
>
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168)
The problem is that "load data" doesn't convert the file into ORC format.
You need to use the following commands:

CREATE TABLE person_staging (id INT, name STRING);

LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE person_staging;

SELECT * FROM person_staging;

INSERT OVERWRITE TABLE person select * from person_staging;

SELECT * FROM person;

Sorry for the bad error message. I improved the ORC reader to explicitly
check that the file is actually an ORC file in
https://issues.apache.org/jira/browse/HIVE-4724 .