Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Possible deadlock in the JobClient


Copy link to this message
-
Possible deadlock in the JobClient
Hi,

I think I met with a possible deadlock situation. Not sure whether it is actually a deadlock or not :-)
Here is my scenario:

Run a Job and call JobClient.monitorAndPrintJob to monitor the job and get the status update.
In parallel try to invoke the JobClient$NetworkedJob.killJob.

For reference I am attaching the Thread dump for both the operation:
"MrPlanRunner" daemon prio=5 tid=7fe12cacf000 nid=0x11352f000 in Object.wait() [11352d000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <7f3c55668> (a org.apache.hadoop.ipc.Client$Call)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.ipc.Client.call(Client.java:1145)
- locked <7f3c55668> (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.Client.call(Client.java:1122)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:148)
at $Proxy40.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClientImpl.java:116)
at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:343)
at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:143)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
- locked <7f4d78950> (a org.apache.hadoop.mapred.ClientServiceDelegate)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:373)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:483)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:319)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:319)
- locked <7f4f70fc0> (a org.apache.hadoop.mapreduce.Job)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:598)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1280)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:432)
at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:902)
at xxxxx.runJob(xxxxx.java:74)
at xxxxx.doExecute(xxxxx.java:39)
at xxxxx.doExecute(xxxxx.java:1)
at xxxxexecute(xxxxxx.java:29)
at xxxx.MrPlanRunner.run(xxxxx.java:117)
at java.lang.Thread.run(Thread.java:680)

"Thread-2" prio=5 tid=7fe12e2de800 nid=0x114d15000 waiting for monitor entry [114d13000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:286)
- waiting to lock <7f4d78950> (a org.apache.hadoop.mapred.ClientServiceDelegate)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:373)
at org.apache.hadoop.mapred.YARNRunner.killJob(YARNRunner.java:509)
at org.apache.hadoop.mapreduce.Job.killJob(Job.java:622)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:319)
- locked <7f4f8fa68> (a org.apache.hadoop.mapred.JobClient$NetworkedJob)
at xxxx.cancelCurrentJob(xxxxxx.java:150)
at xxxx.cancel(xxxxx.java:171)
at xxxx.testCancelJob(xxxx.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
In the thread dump we can observe the object "7f4d78950" is being locked by MrPlanRunner(Thread calling JobClient.monitorAndPrintJob) thread and Thread-2(Thread calling JobClient$NetworkedJob.killJob) is trying to make an attempt to lock the same object and gets Blocked.

Please let me know if this a possible problem in the code or the usage of API is incorrect.
The build being used is:0.23.1-cdh4.0.0b2

Cheers,
Subroto Sanyal
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB