-Re: Yarn -- one of the daemons getting killed
Krishna Kishore Bonagiri 2013-12-17, 09:26
I have run the resource manager in the foreground without nohup and here
are the messages when it was killed, it says it is "Killed" but doesn't say
13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
appattempt_1387266015651_0258_000001 released container
container_1387266015651_0258_01_000003 on node: host: isredeng:36576
#containers=2 available=7936 used=256 with event: FINISHED
13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
container_1387266015651_0258_01_000005 Container Transitioned from ACQUIRED
On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <[EMAIL PROTECTED]> wrote:
> What if you open the daemons in a "screen" session rather than running
> them in the background -- for example, run "yarn resourcemanager". Then you
> can see exactly when they terminate, and hopefully why.
> *From: *Krishna Kishore Bonagiri
> *Sent: *Monday, December 16, 2013 6:20 AM
> *To: *[EMAIL PROTECTED]
> *Reply To: *[EMAIL PROTECTED]
> *Subject: *Re: Yarn -- one of the daemons getting killed
> Hi Vinod,
> Yes, I am running on Linux.
> I was actually searching for a corresponding message in /var/log/messages
> to confirm that OOM killed my daemons, but could not find any corresponding
> messages there! According to the following link, it looks like if it is a
> memory issue, I should see a messages even if OOM is disabled, but I don't
> see it.
> And, is memory consumption more in case of two node cluster than a
> single node one? Also, I see this problem only when I give "*" as the node
> One other thing I suspected was the allowed number of user processes,
> I increased that to 31000 from 1024 but that also didn't help.
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <
> [EMAIL PROTECTED]> wrote:
>> Yes, that is what I suspect. That is why I asked if everything is on a
>> single node. If you are running linux, linux OOM killer may be shooting
>> things down. When it happens, you will see something like "'killed process"
>> in system's syslog.
>> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <
>> [EMAIL PROTECTED]> wrote:
>> One more thing I observed is that, my Client which submits Application
>> Master one after another continuously also gets killed sometimes. So, it is
>> always any of the Java Processes that is getting killed. Does it indicate
>> some excessive memory usage by them or something like that, that is causing
>> them die? If so, how can we resolve this kind of issue?
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <
>> [EMAIL PROTECTED]> wrote:
>>> No, I am running on 2 node cluster.
>>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <
>>> [EMAIL PROTECTED]> wrote:
>>>> Is all of this on a single node?
>>>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <
>>>> [EMAIL PROTECTED]> wrote:
>>>> I am running a small application on YARN (2.2.0) in a loop of 500
>>>> times, and while doing so one of the daemons, node manager, resource
>>>> manager, or data node is getting killed (I mean disappearing) at a random
>>>> point. I see no information in the corresponding log files. How can I know
>>>> why is it happening so?
>>>> And, one more observation is that, this is happening only when I am
>>>> using "*" for node name in the container requests, otherwise when I used a
>>>> specific node name, everything is fine.
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity to which it is addressed and may contain information that is
>>>> confidential, privileged and exempt from disclosure under applicable law.