|
|
-
Thrift service + zklocks + time = death
Edward Capriolo 2012-04-11, 14:29
Hey all,
Running 0.7.1 We are using a combination of hive-thrift+ zookeeper for locking. All works well for a couple days until eventually we hit this condition.
HiveServerException(message:Query returned non-zero code: 10, cause: FAILED: Error in acquiring locks: locks on the underlying objects cannot be acquired. retry after some time, errorCode:10, SQLState:42000) at org.apache.hadoop.hive.service.ThriftHive$execute_result.read(ThriftHive.java:1494) at org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:120) at org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:95) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) XXX XXX XXX at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:44) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:145) XXX XXX at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:44) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:149) at core.ScriptTemplate.run(ScriptTemplate.groovy:28) at core.ScriptTemplate$run.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) at core.ScriptTemplateRunner.main(ScriptTemplateRunner.groovy:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:88) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1302) at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:767) at groovy.lang.GroovyShell.runScriptOrMainOrTestOrRunnable(GroovyShell.java:273) at groovy.lang.GroovyShell.run(GroovyShell.java:229) at groovy.lang.GroovyShell.run(GroovyShell.java:159) at groovy.ui.GroovyMain.processOnce(GroovyMain.java:496) at groovy.ui.GroovyMain.run(GroovyMain.java:311) at groovy.ui.GroovyMain.process(GroovyMain.java:297) at groovy.ui.GroovyMain.processArgs(GroovyMain.java:112) at groovy.ui.GroovyMain.main(GroovyMain.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.tools.GroovyStarter.rootLoader(GroovyStarter.java:108) at org.codehaus.groovy.tools.GroovyStarter.main(GroovyStarter.java:130) }
It seems that eventually the hivethrifts connection to ZK goes stale and then all queries fail with this message. Delta to event might be 3 days of 100 session. Any ideas?
Thanks, Edward
+
Edward Capriolo 2012-04-11, 14:29
-
Re: Thrift service + zklocks + time = death
Edward Capriolo 2012-04-11, 14:43
BTW the problem is not tables being locked. On the server side acquireLocks is getting a null pointer.
32432495-OK 32432498:Error in acquireLocks: java.lang.NullPointerException 32432552-FAILED: Error in acquiring locks: locks on the underlying objects cannot be acquired. retry after some time 32432660-Hive history file=/tmp/hadoop/hive_job_log_hadoop_201204111032_691199436.txt 32432737-Hive history file=/tmp/hadoop/hive_job_log_hadoop_201204111032_1845073466.txt On Wed, Apr 11, 2012 at 10:29 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > Hey all, > > Running 0.7.1 > We are using a combination of hive-thrift+ zookeeper for locking. All > works well for a couple days until eventually we hit this condition. > > HiveServerException(message:Query returned non-zero code: 10, cause: > FAILED: Error in acquiring locks: locks on the underlying objects > cannot be acquired. retry after some time, errorCode:10, > SQLState:42000) > at org.apache.hadoop.hive.service.ThriftHive$execute_result.read(ThriftHive.java:1494) > at org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:120) > at org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:95) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229) > at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) > at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) > XXX > XXX > XXX > at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:44) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:145) > XXX > XXX > at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:44) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:149) > at core.ScriptTemplate.run(ScriptTemplate.groovy:28) > at core.ScriptTemplate$run.call(Unknown Source) > at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) > at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) > at core.ScriptTemplateRunner.main(ScriptTemplateRunner.groovy:19) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:88) > at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) > at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1302) > at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:767) > at groovy.lang.GroovyShell.runScriptOrMainOrTestOrRunnable(GroovyShell.java:273) > at groovy.lang.GroovyShell.run(GroovyShell.java:229)
+
Edward Capriolo 2012-04-11, 14:43
-
Re: Thrift service + zklocks + time = death
Ashutosh Chauhan 2012-04-11, 14:49
Premise of the locking is that in well-behaved queries you will obtain a lock at the start of the query and then relinquish it at the end of query. Since, locks are lease-based, in badly-behaving queries after lease expires, lock will be relinquished. Are you are NPE for same repeated query after couple of days it executing fine or you have many different queries of which some acquire locks while other dont? Do you see the failure on same kind of query every time or its a different query every time. It will be useful to find out which queries are resulting in this to debug this further.
Ashutosh On Wed, Apr 11, 2012 at 07:43, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> BTW the problem is not tables being locked. On the server side > acquireLocks is getting a null pointer. > > 32432495-OK > 32432498:Error in acquireLocks: java.lang.NullPointerException > 32432552-FAILED: Error in acquiring locks: locks on the underlying > objects cannot be acquired. retry after some time > 32432660-Hive history > file=/tmp/hadoop/hive_job_log_hadoop_201204111032_691199436.txt > 32432737-Hive history > file=/tmp/hadoop/hive_job_log_hadoop_201204111032_1845073466.txt > > > On Wed, Apr 11, 2012 at 10:29 AM, Edward Capriolo <[EMAIL PROTECTED]> > wrote: > > Hey all, > > > > Running 0.7.1 > > We are using a combination of hive-thrift+ zookeeper for locking. All > > works well for a couple days until eventually we hit this condition. > > > > HiveServerException(message:Query returned non-zero code: 10, cause: > > FAILED: Error in acquiring locks: locks on the underlying objects > > cannot be acquired. retry after some time, errorCode:10, > > SQLState:42000) > > at > org.apache.hadoop.hive.service.ThriftHive$execute_result.read(ThriftHive.java:1494) > > at > org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:120) > > at > org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:95) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229) > > at > org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) > > at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) > > XXX > > XXX > > XXX > > at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:44) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:145) > > XXX > > XXX > > at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:44) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:149) > > at core.ScriptTemplate.run(ScriptTemplate.groovy:28) > > at core.ScriptTemplate$run.call(Unknown Source) > > at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) > > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124)
+
Ashutosh Chauhan 2012-04-11, 14:49
-
Re: Thrift service + zklocks + time = death
Edward Capriolo 2012-04-11, 15:04
Once hive gets into this condition every meaningful query fails.
Calling this: h.client.execute("describe fracture_act"); h.client.execute("show locks"); Will throw the exception.
I do not know if a specific query is causing this but I doubt it. We have been using hive locks by default for months now doing single shot 'hive -e "myquery"' stuff. I have never found hive locks in this state with the fat client.
I feel that since hive-thrift is long lived something inside of it must be losing a ZKHandle or can not deal with ZK timing out. The fat client is likely never to hit this condition.
Edward
On Wed, Apr 11, 2012 at 10:49 AM, Ashutosh Chauhan <[EMAIL PROTECTED]> wrote: > Premise of the locking is that in well-behaved queries you will obtain a > lock at the start of the query and then relinquish it at the end of query. > Since, locks are lease-based, in badly-behaving queries after lease expires, > lock will be relinquished. > Are you are NPE for same repeated query after couple of days it executing > fine or you have many different queries of which some acquire locks while > other dont? Do you see the failure on same kind of query every time or its a > different query every time. It will be useful to find out which queries are > resulting in this to debug this further. > > Ashutosh > On Wed, Apr 11, 2012 at 07:43, Edward Capriolo <[EMAIL PROTECTED]> > wrote: >> >> BTW the problem is not tables being locked. On the server side >> acquireLocks is getting a null pointer. >> >> 32432495-OK >> 32432498:Error in acquireLocks: java.lang.NullPointerException >> 32432552-FAILED: Error in acquiring locks: locks on the underlying >> objects cannot be acquired. retry after some time >> 32432660-Hive history >> file=/tmp/hadoop/hive_job_log_hadoop_201204111032_691199436.txt >> 32432737-Hive history >> file=/tmp/hadoop/hive_job_log_hadoop_201204111032_1845073466.txt >> >> >> On Wed, Apr 11, 2012 at 10:29 AM, Edward Capriolo <[EMAIL PROTECTED]> >> wrote: >> > Hey all, >> > >> > Running 0.7.1 >> > We are using a combination of hive-thrift+ zookeeper for locking. All >> > works well for a couple days until eventually we hit this condition. >> > >> > HiveServerException(message:Query returned non-zero code: 10, cause: >> > FAILED: Error in acquiring locks: locks on the underlying objects >> > cannot be acquired. retry after some time, errorCode:10, >> > SQLState:42000) >> > at >> > org.apache.hadoop.hive.service.ThriftHive$execute_result.read(ThriftHive.java:1494) >> > at >> > org.apache.hadoop.hive.service.ThriftHive$Client.recv_execute(ThriftHive.java:120) >> > at >> > org.apache.hadoop.hive.service.ThriftHive$Client.execute(ThriftHive.java:95) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > at >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at >> > org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:229) >> > at >> > org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) >> > at >> > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40) >> > at >> > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) >> > at >> > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) >> > XXX >> > XXX >> > XXX >> > at >> > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:44) >> > at >> > org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:141) >> > at >> > org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:145)
+
Edward Capriolo 2012-04-11, 15:04
|
|