|
Bo Wang
2012-08-22, 23:01
Arun C Murthy
2012-08-22, 23:22
Bo Wang
2012-08-23, 00:20
Vinod Kumar Vavilapalli
2012-08-23, 02:29
Bo Wang
2012-08-29, 22:28
Vinod Kumar Vavilapalli
2012-08-30, 00:23
Bo Wang
2012-08-30, 20:43
Bo Wang
2012-09-01, 00:12
|
-
killApplication doesn't kill AppMasterBo Wang 2012-08-22, 23:01
Hello,
I have an AM listening to a port. I kill the application by sending a request via ClientRMProtocol # killApplication. In the NM log, the corresponding container of AM transitions from RUNNING to KILLING to CONTAINER_CLEANEDUP_AFTER_KILL to DONE. However, the AM is still running and the port is not released. I wonder what's going wrong here. Thanks, Bo
-
Re: killApplication doesn't kill AppMasterArun C Murthy 2012-08-22, 23:22
Did you grab a stack trace of the AM?
On Aug 22, 2012, at 4:01 PM, Bo Wang wrote: > Hello, > > I have an AM listening to a port. I kill the application by sending a > request via ClientRMProtocol # killApplication. In the NM log, the > corresponding container of AM transitions from RUNNING to KILLING > to CONTAINER_CLEANEDUP_AFTER_KILL to DONE. However, the AM is still running > and the port is not released. I wonder what's going wrong here. > > Thanks, > Bo -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
-
Re: killApplication doesn't kill AppMasterBo Wang 2012-08-23, 00:20
Thanks for looking into this, Arun.
I am not sure when to grab the stack trace of the AM. In the stdout/stderr of AM, no stack trace (or exception) is emitted. Btw, I am curious how NM kills a container. Does it directly kill the JVM process? Thanks, Bo On Wed, Aug 22, 2012 at 4:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Did you grab a stack trace of the AM? > > On Aug 22, 2012, at 4:01 PM, Bo Wang wrote: > > > Hello, > > > > I have an AM listening to a port. I kill the application by sending a > > request via ClientRMProtocol # killApplication. In the NM log, the > > corresponding container of AM transitions from RUNNING to KILLING > > to CONTAINER_CLEANEDUP_AFTER_KILL to DONE. However, the AM is still > running > > and the port is not released. I wonder what's going wrong here. > > > > Thanks, > > Bo > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > >
-
Re: killApplication doesn't kill AppMasterVinod Kumar Vavilapalli 2012-08-23, 02:29
> I am not sure when to grab the stack trace of the AM. In the stdout/stderr > of AM, no stack trace (or exception) is emitted. You can login to the node and if the process is still alive, you can do a "kill -3" which will dump the threads' status to stderr. > Btw, I am curious how NM kills a container. Does it directly kill the JVM > process? NM directly kills the JVM with a SIGTERM followed by a SIGKILL. BTW, please also check the corresponding NM's logs if there is some exception/error which could mean a bug in NM code. HTH, +Vinod
-
Re: killApplication doesn't kill AppMasterBo Wang 2012-08-29, 22:28
Hi Vinod,
Thanks for the suggestion. I was involved with some other issues before getting back to this one. Sorry for replying late. I tried to kill the process with "kill -3" but it was not interrupted. Then I used "kill -9" which sent a SIGKILL and the process was killed. I checked the stderr and used jstack to dump the stack trace. Things look just normal. Actually, I simplified my test AM to be just an empty while loop. I look into the code to find where the SIGKILL is sent in YARN but didn't find it. I traced down to NodeManager.stopContainer, but didn't see that. Would you mind sending me a pointer to the actual code? Thanks, Bo On Wed, Aug 22, 2012 at 7:29 PM, Vinod Kumar Vavilapalli < [EMAIL PROTECTED]> wrote: > > > I am not sure when to grab the stack trace of the AM. In the > stdout/stderr > > of AM, no stack trace (or exception) is emitted. > > > You can login to the node and if the process is still alive, you can do a > "kill -3" which will dump the threads' status to stderr. > > > > Btw, I am curious how NM kills a container. Does it directly kill the JVM > > process? > > > NM directly kills the JVM with a SIGTERM followed by a SIGKILL. > > BTW, please also check the corresponding NM's logs if there is some > exception/error which could mean a bug in NM code. > > HTH, > +Vinod
-
Re: killApplication doesn't kill AppMasterVinod Kumar Vavilapalli 2012-08-30, 00:23
Please attach your jstack dump, may be I can spot something. Pointer for what you asked: ContainerManagerImpl.stopContainer() -> ContainerImpl.KillTransition -> ContainersLauncher -> ContainerLaunch.cleanupContainer(). Follow the events carefully. HTH, +Vinod On Aug 29, 2012, at 3:28 PM, Bo Wang wrote: > Hi Vinod, > > Thanks for the suggestion. I was involved with some other issues before > getting back to this one. Sorry for replying late. > > I tried to kill the process with "kill -3" but it was not interrupted. Then > I used "kill -9" which sent a SIGKILL and the process was killed. I checked > the stderr and used jstack to dump the stack trace. Things look just > normal. Actually, I simplified my test AM to be just an empty while loop. > > I look into the code to find where the SIGKILL is sent in YARN but didn't > find it. I traced down to NodeManager.stopContainer, but didn't see that. > Would you mind sending me a pointer to the actual code? > > Thanks, > Bo > > On Wed, Aug 22, 2012 at 7:29 PM, Vinod Kumar Vavilapalli < > [EMAIL PROTECTED]> wrote: > >> >>> I am not sure when to grab the stack trace of the AM. In the >> stdout/stderr >>> of AM, no stack trace (or exception) is emitted. >> >> >> You can login to the node and if the process is still alive, you can do a >> "kill -3" which will dump the threads' status to stderr. >> >> >>> Btw, I am curious how NM kills a container. Does it directly kill the JVM >>> process? >> >> >> NM directly kills the JVM with a SIGTERM followed by a SIGKILL. >> >> BTW, please also check the corresponding NM's logs if there is some >> exception/error which could mean a bug in NM code. >> >> HTH, >> +Vinod
-
Re: killApplication doesn't kill AppMasterBo Wang 2012-08-30, 20:43
The calling graph is very useful. Thanks, Vinod.
I traced the code and enabled debugging log. I found one thing interesting here. While running the AM, I "ps aux | grep SampleAM". I found two running processes. 34990 /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java SampleAM 34984 /bin/bash -c /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java SampleAM 1>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stdout 2>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stderr After killing, in the NM log, I found following. 12/08/30 13:29:27,542 DEBUG [AsyncDispatcher event handler] nodemanager.DefaultContainerExecutor: Sending signal 15 to pid 34984 as user bo.wang 12/08/30 13:29:27,836 DEBUG [Task killer for 34984] nodemanager.DefaultContainerExecutor: Sending signal 9 to pid 34984 as user bo.wang It looks like NM is only killing process 34984, but not 34990. As a result, after killing, process 34990 is still running. Is this a bug? BTW, I am running on my Macbook, which may be the reason YARN is using DefaultContainerExecutor rather than LinuxContainerExecutor. Thanks, Bo On Wed, Aug 29, 2012 at 5:23 PM, Vinod Kumar Vavilapalli < [EMAIL PROTECTED]> wrote: > > Please attach your jstack dump, may be I can spot something. > > Pointer for what you asked: ContainerManagerImpl.stopContainer() -> > ContainerImpl.KillTransition -> ContainersLauncher -> > ContainerLaunch.cleanupContainer(). Follow the events carefully. > > HTH, > +Vinod > > On Aug 29, 2012, at 3:28 PM, Bo Wang wrote: > > > Hi Vinod, > > > > Thanks for the suggestion. I was involved with some other issues before > > getting back to this one. Sorry for replying late. > > > > I tried to kill the process with "kill -3" but it was not interrupted. > Then > > I used "kill -9" which sent a SIGKILL and the process was killed. I > checked > > the stderr and used jstack to dump the stack trace. Things look just > > normal. Actually, I simplified my test AM to be just an empty while loop. > > > > I look into the code to find where the SIGKILL is sent in YARN but didn't > > find it. I traced down to NodeManager.stopContainer, but didn't see that. > > Would you mind sending me a pointer to the actual code? > > > > Thanks, > > Bo > > > > On Wed, Aug 22, 2012 at 7:29 PM, Vinod Kumar Vavilapalli < > > [EMAIL PROTECTED]> wrote: > > > >> > >>> I am not sure when to grab the stack trace of the AM. In the > >> stdout/stderr > >>> of AM, no stack trace (or exception) is emitted. > >> > >> > >> You can login to the node and if the process is still alive, you can do > a > >> "kill -3" which will dump the threads' status to stderr. > >> > >> > >>> Btw, I am curious how NM kills a container. Does it directly kill the > JVM > >>> process? > >> > >> > >> NM directly kills the JVM with a SIGTERM followed by a SIGKILL. > >> > >> BTW, please also check the corresponding NM's logs if there is some > >> exception/error which could mean a bug in NM code. > >> > >> HTH, > >> +Vinod > >
-
Re: killApplication doesn't kill AppMasterBo Wang 2012-09-01, 00:12
Created a JIRA here.
https://issues.apache.org/jira/browse/YARN-76 On Thu, Aug 30, 2012 at 1:43 PM, Bo Wang <[EMAIL PROTECTED]> wrote: > The calling graph is very useful. Thanks, Vinod. > > I traced the code and enabled debugging log. I found one thing interesting > here. > > While running the AM, I "ps aux | grep SampleAM". I found two running > processes. > > 34990 > /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java > SampleAM > 34984 /bin/bash -c > /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java > SampleAM > 1>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stdout > 2>/tmp/logs/application_1346348588670_0011/container_1346348588670_0011_01_000001/stderr > > After killing, in the NM log, I found following. > > 12/08/30 13:29:27,542 DEBUG [AsyncDispatcher event handler] > nodemanager.DefaultContainerExecutor: Sending signal 15 to pid 34984 as > user bo.wang > 12/08/30 13:29:27,836 DEBUG [Task killer for 34984] > nodemanager.DefaultContainerExecutor: Sending signal 9 to pid 34984 as user > bo.wang > > It looks like NM is only killing process 34984, but not 34990. As a > result, after killing, process 34990 is still running. > > Is this a bug? BTW, I am running on my Macbook, which may be the reason > YARN is using DefaultContainerExecutor rather than LinuxContainerExecutor. > > Thanks, > Bo > > > On Wed, Aug 29, 2012 at 5:23 PM, Vinod Kumar Vavilapalli < > [EMAIL PROTECTED]> wrote: > >> >> Please attach your jstack dump, may be I can spot something. >> >> Pointer for what you asked: ContainerManagerImpl.stopContainer() -> >> ContainerImpl.KillTransition -> ContainersLauncher -> >> ContainerLaunch.cleanupContainer(). Follow the events carefully. >> >> HTH, >> +Vinod >> >> On Aug 29, 2012, at 3:28 PM, Bo Wang wrote: >> >> > Hi Vinod, >> > >> > Thanks for the suggestion. I was involved with some other issues before >> > getting back to this one. Sorry for replying late. >> > >> > I tried to kill the process with "kill -3" but it was not interrupted. >> Then >> > I used "kill -9" which sent a SIGKILL and the process was killed. I >> checked >> > the stderr and used jstack to dump the stack trace. Things look just >> > normal. Actually, I simplified my test AM to be just an empty while >> loop. >> > >> > I look into the code to find where the SIGKILL is sent in YARN but >> didn't >> > find it. I traced down to NodeManager.stopContainer, but didn't see >> that. >> > Would you mind sending me a pointer to the actual code? >> > >> > Thanks, >> > Bo >> > >> > On Wed, Aug 22, 2012 at 7:29 PM, Vinod Kumar Vavilapalli < >> > [EMAIL PROTECTED]> wrote: >> > >> >> >> >>> I am not sure when to grab the stack trace of the AM. In the >> >> stdout/stderr >> >>> of AM, no stack trace (or exception) is emitted. >> >> >> >> >> >> You can login to the node and if the process is still alive, you can >> do a >> >> "kill -3" which will dump the threads' status to stderr. >> >> >> >> >> >>> Btw, I am curious how NM kills a container. Does it directly kill the >> JVM >> >>> process? >> >> >> >> >> >> NM directly kills the JVM with a SIGTERM followed by a SIGKILL. >> >> >> >> BTW, please also check the corresponding NM's logs if there is some >> >> exception/error which could mean a bug in NM code. >> >> >> >> HTH, >> >> +Vinod >> >> > |