Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Any sugesstions java.io.IOException: Not a data file error


Copy link to this message
-
Re: Any sugesstions java.io.IOException: Not a data file error
This is not supported.  The assumption is that all the files in the
directory will be Avro.  This is a general assumption across Hive, not
specific to the Avro serde.

On 10/30/2013 01:50 AM, Valluri, Sathish wrote:
>
> Resending after disabling security signing..
>
> *From:*Valluri, Sathish [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, October 30, 2013 2:17 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Any sugesstions java.io.IOException: Not a data file error
>
> Hi All,
>
> Hive Mapreduce jobs failing with the following *java.io.IOException:
> Not a data file error* if there are files other than avro in the HDFS.
>
> I have created a Hive external table as shown below,
>
> CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES
> ('avro.schema.literal'='{ <schema json literal>') STORED AS
> INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
> '*/testdata/*';
>
> Running select count(*) from testable;
>
> When /testdata contains avro files the query works fine and gives the
> results properly.
>
> If the /testdata have some other format files let's say
> */testdata/test.txt* the query is failing with the following error.
>
> java.io.IOException: java.lang.reflect.InvocationTargetException at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:341)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
> at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by:
> java.lang.reflect.InvocationTargetException at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:327)
> ... 11 more *Caused by: java.io.IOException: Not a data file. at
> *org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
> at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
> at
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:72)
> at
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
> ... 16 more
>
> Can anyone suggest any parameter or any changes needs to be made for
> the query to be successful. Basically Hive should skip the other
> format files and load only the avro files when processing data on the
> HDFS.
>
> Waiting for any suggestions to resolve this issue.
>
> Regards
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB