Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Infinite loop with ORC file and Hive 0.11


Copy link to this message
-
Infinite loop with ORC file and Hive 0.11
Iván de Prado 2013-09-05, 09:40
Hi,

We are using Hive 0.11 with ORC file format and we get some tasks blocked
in some kind of infinite loop. They keep working indefinitely when we set a
huge task expiry timeout. If we the expiry time to 600 second, the taks
fail because of not reporting progress, and finally, the Job fails.

That is not consistent, and some times between jobs executions the behavior
changes. It happen for different queries.

We are using Hive 0.11 with Hadoop hadoop-1.1.2.23 from Hortonworks. The
taks that is blocked keeps consuming 100% of CPU usage, and the stack trace
is always the same consistently. Everything points to some kind of infinite
loop. My guessing is that it has some relation to the ORC file. Maybe some
pointer is not right when writing generating some kind of infinite loop
when reading.  Or maybe there is a bug in the reading stage. Any ideas?
Should I put that as a bug in Jira?

More information below. The stack trace:

------
"main" prio=10 tid=0x00007f20a000a800 nid=0x1ed2 runnable
[0x00007f20a8136000]
   java.lang.Thread.State: RUNNABLE
 at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:256)
 - locked <0x00000000f42a6ca0> (a java.util.zip.ZStreamRef)
at org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:64)
 at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:128)
at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:143)
 at
org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVulong(SerializationUtils.java:54)
at
org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVslong(SerializationUtils.java:65)
 at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.readValues(RunLengthIntegerReader.java:66)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.next(RunLengthIntegerReader.java:81)
 at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:332)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:802)
 at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1214)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:71)
 at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:46)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
 at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
 at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:300)
 at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:218)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
 - eliminated <0x00000000e1459700> (a
org.apache.hadoop.mapred.MapTask$TrackedRecordReader)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
 - locked <0x00000000e1459700> (a
org.apache.hadoop.mapred.MapTask$TrackedRecordReader)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
--------

We have seen the same stack trace repeatedly for several executions of
jstack.

The log file for this kind of task is the following:

-------
2013-09-04 23:12:34,332 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2013-09-04 23:12:34,681 INFO org.apache.hadoop.mapred.TaskRunner: Creating
symlink:
/hd/hd6/mapred/local/taskTracker/distcache/1758137359311570022_-1138434677_1812098585/master/tmp/hive-datasalt/hive_2013-09-04_22-45-13_829_2202639092470021957/-mr-10004/22027cf8-f583-41d7-adb8-e7e74922d113
<-
/hd/hd1/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/attempt_201309040511_0003_m_000803_1/work/HIVE_PLAN22027cf8-f583-41d7-adb8-e7e74922d113
2013-09-04 23:12:34,718 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/hd/hd5/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/jars/com
<-
/hd/hd1/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/attempt_201309040511_0003_m_000803_1/work/com
2013-09-04 23:12:34,733 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/hd/hd5/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/jars/javolution
<-
/hd/hd1/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/attempt_201309040511_0003_m_000803_1/work/javolution
2013-09-04 23:12:34,743 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/hd/hd5/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/jars/org
<-
/hd/hd1/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/attempt_201309040511_0003_m_000803_1/work/org
2013-09-04 23:12:34,756 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/hd/hd5/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/jars/job.jar
<-
/hd/hd1/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_0003/attempt_201309040511_0003_m_000803_1/work/job.jar
2013-09-04 23:12:34,768 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/hd/hd5/mapred/local/taskTracker/datasalt/jobcache/job_201309040511_000