Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0


Copy link to this message
-
Re: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0
Hi Arun,
 I was running on a single node cluster, so all my 100+ containers are on
single node. And, the problem is gone when I increased YARN_HEAP_SIZE to
2GB.

Thanks,
Kishore
On Thu, Aug 1, 2013 at 5:01 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> How many containers are you running per node?
>
> On Jul 25, 2013, at 5:21 AM, Krishna Kishore Bonagiri <
> [EMAIL PROTECTED]> wrote:
>
> Hi Devaraj,
>
>  I used to run this application with the same number of containers
> successfully on previous version, i.e. hadoop-2.0.4-alpha. Is it failing
> with the new version, because YARN itself is also adding some more threads
> than the previous versions?
>
> Thanks,
> Kishore
>
>
> On Thu, Jul 25, 2013 at 4:24 PM, Devaraj k <[EMAIL PROTECTED]> wrote:
>
>>  Hi Kishore,****
>>
>> ** **
>>
>> It seems that system doesn’t have enough resources to launch a new
>> thread. ****
>>
>> ** **
>>
>> Could you check the system is affordable to launch the configured
>> containers and try increasing the native memory available in the system by
>> reducing the no of running processes in the system.****
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj k****
>>
>> ** **
>>
>> *From:* Krishna Kishore Bonagiri [mailto:[EMAIL PROTECTED]]
>> *Sent:* 25 July 2013 16:09
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Node manager crashing when running an app requiring 100
>> containers on hadoop-2.1.0-beta RC0****
>>
>> ** **
>>
>> Hi,****
>>
>> ** **
>>
>>   I am running an application against hadoop-2.1.0-beta RC, and my app
>> requires 117 containers, I have got all the containers allocated, but while
>> starting those containers, at around 99th container the node manager has
>> gone down with the following kind of error in it's log. Also, I could
>> reproduce this error running a "sleep 200; date" command using the
>> Distributed Shell example, in which case I got this error at around 66th
>> container.****
>>
>> ** **
>>
>> ** **
>>
>> 2013-07-25 06:07:17,743 FATAL
>> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process
>> reaper,5,main] threw an Error.  Shutting down now...****
>>
>> java.lang.OutOfMemoryError: Failed to create a thread: retVal
>> -1073741830, errno 11****
>>
>>         at java.lang.Thread.startImpl(Native Method)****
>>
>>         at java.lang.Thread.start(Thread.java:887)****
>>
>>         at java.lang.ProcessInputStream.<init>(UNIXProcess.java:472)****
>>
>>         at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157)****
>>
>>         at
>> java.security.AccessController.doPrivileged(AccessController.java:202)***
>> *
>>
>>         at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137)****
>>
>> 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with
>> status -1 Message: HaltException****
>>
>> ** **
>>
>> Thanks,****
>>
>> Kishore****
>>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>