Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Possible deadlock in the JobClient


Copy link to this message
-
Possible deadlock in the JobClient
Hi,

I think I met with a possible deadlock situation. Not sure whether it is actually a deadlock or not :-)
Here is my scenario:

Run a Job and call JobClient.monitorAndPrintJob to monitor the job and get the status update.
In parallel try to invoke the JobClient$NetworkedJob.killJob.

For reference I am attaching the Thread dump for both the operation:
"MrPlanRunner" daemon prio=5 tid=7fe12cacf000 nid=0x11352f000 in Object.wait() [11352d000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <7f3c55668> (a org.apache.hadoop.ipc.Client$Call)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.ipc.Client.call(Client.java:1145)
- locked <7f3c55668> (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.Client.call(Client.java:1122)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:148)
at $Proxy40.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getApplicationReport(ClientRMProtocolPBClientImpl.java:116)
at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:343)
at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:143)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
- locked <7f4d78950> (a org.apache.hadoop.mapred.ClientServiceDelegate)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:373)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:483)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:322)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:319)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:319)
- locked <7f4f70fc0> (a org.apache.hadoop.mapreduce.Job)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:598)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1280)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:432)
at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:902)
at xxxxx.runJob(xxxxx.java:74)
at xxxxx.doExecute(xxxxx.java:39)
at xxxxx.doExecute(xxxxx.java:1)
at xxxxexecute(xxxxxx.java:29)
at xxxx.MrPlanRunner.run(xxxxx.java:117)
at java.lang.Thread.run(Thread.java:680)

"Thread-2" prio=5 tid=7fe12e2de800 nid=0x114d15000 waiting for monitor entry [114d13000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:286)
- waiting to lock <7f4d78950> (a org.apache.hadoop.mapred.ClientServiceDelegate)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:373)
at org.apache.hadoop.mapred.YARNRunner.killJob(YARNRunner.java:509)
at org.apache.hadoop.mapreduce.Job.killJob(Job.java:622)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:319)
- locked <7f4f8fa68> (a org.apache.hadoop.mapred.JobClient$NetworkedJob)
at xxxx.cancelCurrentJob(xxxxxx.java:150)
at xxxx.cancel(xxxxx.java:171)
at xxxx.testCancelJob(xxxx.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
In the thread dump we can observe the object "7f4d78950" is being locked by MrPlanRunner(Thread calling JobClient.monitorAndPrintJob) thread and Thread-2(Thread calling JobClient$NetworkedJob.killJob) is trying to make an attempt to lock the same object and gets Blocked.

Please let me know if this a possible problem in the code or the usage of API is incorrect.
The build being used is:0.23.1-cdh4.0.0b2

Cheers,
Subroto Sanyal