Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Any sugesstions java.io.IOException: Not a data file error


Copy link to this message
-
Re: Any sugesstions java.io.IOException: Not a data file error
Jakob Homan 2013-11-08, 19:34
This is not supported.  The assumption is that all the files in the
directory will be Avro.  This is a general assumption across Hive, not
specific to the Avro serde.

On 10/30/2013 01:50 AM, Valluri, Sathish wrote:
>
> Resending after disabling security signing..
>
> *From:*Valluri, Sathish [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, October 30, 2013 2:17 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Any sugesstions java.io.IOException: Not a data file error
>
> Hi All,
>
> Hive Mapreduce jobs failing with the following *java.io.IOException:
> Not a data file error* if there are files other than avro in the HDFS.
>
> I have created a Hive external table as shown below,
>
> CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES
> ('avro.schema.literal'='{ <schema json literal>') STORED AS
> INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
> '*/testdata/*';
>
> Running select count(*) from testable;
>
> When /testdata contains avro files the query works fine and gives the
> results properly.
>
> If the /testdata have some other format files let's say
> */testdata/test.txt* the query is failing with the following error.
>
> java.io.IOException: java.lang.reflect.InvocationTargetException at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:341)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
> at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by:
> java.lang.reflect.InvocationTargetException at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:327)
> ... 11 more *Caused by: java.io.IOException: Not a data file. at
> *org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:72)
> at
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
> ... 16 more
>
> Can anyone suggest any parameter or any changes needs to be made for
> the query to be successful. Basically Hive should skip the other
> format files and load only the avro files when processing data on the
> HDFS.
>
> Waiting for any suggestions to resolve this issue.
>
> Regards