Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Job end notification does not always work (Hadoop 2.x)


+
Alejandro Abdelnur 2013-06-24, 16:11
+
Devaraj k 2013-06-25, 03:42
+
Prashant Kommireddi 2013-06-25, 06:12
Copy link to this message
-
Re: Job end notification does not always work (Hadoop 2.x)
Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve
status/counters from it, from Java AP or Web UI. So I'd for any practical
usage, you need it.

thx
On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <[EMAIL PROTECTED]> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:[EMAIL PROTECTED]]
> *Sent:* 24 June 2013 21:42
> *To:* [EMAIL PROTECTED]
> *Cc:* [EMAIL PROTECTED]
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <[EMAIL PROTECTED]> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:[EMAIL PROTECTED] <[EMAIL PROTECTED]>]
> *Sent:* 23 June 2013 19:01
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <[EMAIL PROTECTED]>
> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <[EMAIL PROTECTED]>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <[EMAIL PROTECTED]>
> wrote:****
Alejandro
+
Devaraj k 2013-06-25, 13:11
+
Alejandro Abdelnur 2013-06-25, 13:21
+
Prashant Kommireddi 2013-06-22, 21:44
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB