-Re: Yarn -- one of the daemons getting killed
Krishna Kishore Bonagiri 2013-12-18, 12:04
Thanks for the link, I went through it and it looks like the OOM killer
picks a process that has the highest oom_score. I have tried to capture
oom_score for all the YARN daemon processes after each run of my
application.The first time I have captured these details, I see that the
name node is killed where as the Node Manager has the highest score. So, I
don't if it is really the OOM killer that has killed it!
Please see the output of my run attached, which also has the output of
free command after each run. The output of free command doesn't either show
any exhaustion of system memory.
Also, one more thing I have done today is, I have added audit rules for
each of the daemons to capture all the system calls. And, in the audit log,
I see futex() system call occurring in the killed daemon processes. I don't
know if it causes the daemon to die? and why does that call happen...
On Wed, Dec 18, 2013 at 12:31 AM, Vinod Kumar Vavilapalli <
[EMAIL PROTECTED]> wrote:
> That's good info. It is more than likely that it is the OOM killer. See
> http://stackoverflow.com/questions/726690/who-killed-my-process-and-whyfor example.
> On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <
> [EMAIL PROTECTED]> wrote:
> Hi Jeff,
> I have run the resource manager in the foreground without nohup and here
> are the messages when it was killed, it says it is "Killed" but doesn't say
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
> appattempt_1387266015651_0258_000001 released container
> container_1387266015651_0258_01_000003 on node: host: isredeng:36576
> #containers=2 available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
> container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
> to RUNNING
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <[EMAIL PROTECTED]> wrote:
>> What if you open the daemons in a "screen" session rather than running
>> them in the background -- for example, run "yarn resourcemanager". Then you
>> can see exactly when they terminate, and hopefully why.
>> *From: *Krishna Kishore Bonagiri
>> *Sent: *Monday, December 16, 2013 6:20 AM
>> *To: *[EMAIL PROTECTED]
>> *Reply To: *[EMAIL PROTECTED]
>> *Subject: *Re: Yarn -- one of the daemons getting killed
>> Hi Vinod,
>> Yes, I am running on Linux.
>> I was actually searching for a corresponding message in
>> /var/log/messages to confirm that OOM killed my daemons, but could not find
>> any corresponding messages there! According to the following link, it looks
>> like if it is a memory issue, I should see a messages even if OOM is
>> disabled, but I don't see it.
>> And, is memory consumption more in case of two node cluster than a
>> single node one? Also, I see this problem only when I give "*" as the node
>> One other thing I suspected was the allowed number of user processes,
>> I increased that to 31000 from 1024 but that also didn't help.
>> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
>> [EMAIL PROTECTED]> wrote:
>>> Yes, that is what I suspect. That is why I asked if everything is on a
>>> single node. If you are running linux, linux OOM killer may be shooting
>>> things down. When it happens, you will see something like "'killed process"
>>> in system's syslog.
>>> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>>> [EMAIL PROTECTED]> wrote:
>>> One more thing I observed is that, my Client which submits Application
>>> Master one after another continuously also gets killed sometimes. So, it is
>>> always any of the Java Processes that is getting killed. Does it indicate
>>> some excessive memory usage by them or something like that, that is causing