Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Job running on YARN gets automatically killed after 10-12 minutes

Krishna Kishore Bonagiri 2012-11-05, 16:32
Copy link to this message
Re: Job running on YARN gets automatically killed after 10-12 minutes
Is this your custom application and not, say, MapReduce or the distributed shell?

If that is the case, the ApplicationMaster needs to constantly ping the ResourceManager so that RM can know that it is alive. This is done by simply doing an allocate(..) call that is part of the scheduler API. This you should do irrespective of whether you have any new container requests or not.

The default liveliness interval is 10 mins, so you are seeing that your app is getting killed roughly after that much time.

+Vinod Kumar Vavilapalli
Hortonworks Inc.

On Nov 5, 2012, at 8:32 AM, Krishna Kishore Bonagiri wrote:

> Hi,
>   My job that is running on YARN framework gets killed automatically after 10-12 minutes.  
>   I have changed the monitoring time limit Client.java that comes with distributed shell example, and also bumped values for a set of interval parameters in $HADOOP_CONF_DIR/yarn-site.xml by 10 fold. Then also the same kind of error repeats.
> Note: I am not sending frequent heartbeats to the RM from AM, also not sending frequent container requests to RM.
> Content from RM's log:
> ====================>
> 2012-11-05 05:50:41,721 INFO  fifo.FifoScheduler (FifoScheduler.java:containerCompleted(721)) - Application appattempt_1352112580456_0001_000001 released container container_1352112580456_0001_01_000004 on node: host: isredeng:33055 #containers=2 available=4096 used=4096 with event: FINISHED
> 2012-11-05 06:03:03,855 INFO  util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(111)) - Expired:appattempt_1352112580456_0001_000001 Timed out after 600 secs
> 2012-11-05 06:03:03,867 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(483)) - appattempt_1352112580456_0001_000001 State change from RUNNING to FAILED
> Content from NM's log:
> =====================>
> 2012-11-05 06:03:04,364 INFO  containermanager.AuxServices (AuxServices.java:handle(160)) - Got event APPLICATION_STOP for appId application_1352112580456_0001
> 2012-11-05 06:03:04,373 INFO  application.Application (ApplicationImpl.java:handle(387)) - Application application_1352112580456_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> Is this behavior not controllable by any of the parameters in XML configuration files?
> Thanks & Regards,
> Kishore