Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Container allocation fails randomly


Copy link to this message
-
Re: Container allocation fails randomly
Krishna Kishore Bonagiri 2013-09-17, 09:47
Hi Omkar,

  Thanks for the quick reply, and sorry for not being able to get the
required logs that you have asked for.

  But in the meanwhile I just wanted to check if you can get a clue with
the information I have now. I am seeing the following kind of error message
in AppMaster.stderr whenever this failure is happening. I don't know why
does it happen, the getProgress() call that I have implemented
in RMCallbackHandler could never return a negative value! Doesn't this
error mean that this getProgress() is giving a -ve value?

Exception in thread "AMRM Heartbeater thread"
java.lang.IllegalArgumentException: Progress indicator should not be
negative
        at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
        at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:199)
        at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)

Thanks,
Kishore
On Fri, Sep 13, 2013 at 2:59 AM, Omkar Joshi <[EMAIL PROTECTED]> wrote:

> Can you give more information? logs (complete) will help a lot around this
> time frame. Are the containers getting assigned via scheduler? is it
> failing when node manager tries to start container? I clearly see the
> diagnostic message is empty but do you see anything in NM logs? Also if
> there were running containers on the machine before launching new ones..
> then are they killed? or they are still hanging around? can you also try
> applying patch "https://issues.apache.org/jira/browse/YARN-1053" ? and
> check if you can see any message?
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Thu, Sep 12, 2013 at 6:15 AM, Krishna Kishore Bonagiri <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>   I am using 2.1.0-beta and have seen container allocation failing
>> randomly even when running the same application in a loop. I know that the
>> cluster has enough resources to give, because it gave the resources for the
>> same application all the other times in the loop and ran it successfully.
>>
>>    I have observed a lot of the following kind of messages in the node
>> manager's log whenever such failure happens, any clues as to why it happens?
>>
>> 2013-09-12 08:54:36,204 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
>> out status for container: container_id { app_attempt_id { application_id {
>> id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
>> C_RUNNING diagnostics: "" exit_status: -1000
>> 2013-09-12 08:54:37,220 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
>> out status for container: container_id { app_attempt_id { application_id {
>> id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
>> C_RUNNING diagnostics: "" exit_status: -1000
>> 2013-09-12 08:54:38,231 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
>> out status for container: container_id { app_attempt_id { application_id {
>> id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
>> C_RUNNING diagnostics: "" exit_status: -1000
>> 2013-09-12 08:54:39,239 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
>> out status for container: container_id { app_attempt_id { application_id {
>> id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
>> C_RUNNING diagnostics: "" exit_status: -1000
>> 2013-09-12 08:54:40,267 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
>> out status for container: container_id { app_attempt_id { application_id {
>> id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
>> C_RUNNING diagnostics: "" exit_status: -1000
>> 2013-09-12 08:54:41,275 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
>> out status for container: container_id { app_attempt_id { application_id {
>> id