Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Job end notification does not always work (Hadoop 2.x)


+
Alejandro Abdelnur 2013-06-24, 16:11
+
Devaraj k 2013-06-25, 03:42
+
Prashant Kommireddi 2013-06-25, 06:12
Copy link to this message
-
Re: Job end notification does not always work (Hadoop 2.x)
Alejandro Abdelnur 2013-06-25, 12:35
Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve
status/counters from it, from Java AP or Web UI. So I'd for any practical
usage, you need it.

thx
On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k <[EMAIL PROTECTED]> wrote:

>  It is not mandatory to have running HS in the cluster. Still the user
> can submit the job without HS in the cluster, and user may expect the
> Job/App End Notification.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Alejandro Abdelnur [mailto:[EMAIL PROTECTED]]
> *Sent:* 24 June 2013 21:42
> *To:* [EMAIL PROTECTED]
> *Cc:* [EMAIL PROTECTED]
>
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ** **
>
> if we ought to do this in a yarn service it
> should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
> be a good choice if we are concerned about the extra work this would cause
> in the RM. the problem with the current HS is that it is MR specific, we
> should generalize it for diff AM types. ****
>
> ** **
>
> thx****
>
>
> Alejandro****
>
> (phone typing)****
>
>
> On Jun 23, 2013, at 23:28, Devaraj k <[EMAIL PROTECTED]> wrote:****
>
>  Even if we handle all the failure cases in AM for Job End Notification,
> we may miss cases like abrupt kill of AM when it is in last retry. If we
> choose NM to give the notification, again RM needs to identify which NM
> should give the end-notification as we don't have any direct protocol
> between AM and NM.****
>
>  ****
>
> I feel it would be better to move End-Notification responsibility to RM as
> Yarn Service because it ensures 100% notification and also useful for other
> types of applications as well. ****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj K****
>
>  ****
>
> *From:* Ravi Prakash [mailto:[EMAIL PROTECTED] <[EMAIL PROTECTED]>]
> *Sent:* 23 June 2013 19:01
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> Hi Alejandro,
>
> Thanks for your reply! I was thinking more along the lines Prashant
> suggested i.e. a failure during init() should still trigger an attempt to
> notify (by the AM). But now that you mention it, maybe we would be better
> of including this as a YARN feature after all (specially with all the new
> AMs being written). We could let the NM of the AM handle the notification
> burden, so that the RM doesn't get unduly taxed. Thoughts?
>
> Thanks
> Ravi****
>
>  ****
>
>  ****
>    ------------------------------
>
> *From:* Alejandro Abdelnur <[EMAIL PROTECTED]>
> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> *Sent:* Saturday, June 22, 2013 7:37 PM
> *Subject:* Re: Job end notification does not always work (Hadoop 2.x)****
>
>  ****
>
> If the AM fails before doing the job end notification, at any stage of the
> execution for whatever reason, the job end notification will never be
> deliver. There is not way to fix this unless the notification is done by a
> Yarn service. The 2 'candidate' services for doing this would be the RM and
> the HS. The job notification URL is in the job conf. The RM never sees the
> job conf, that rules out the RM out unless we add, at AM registration time
> the possibility to specify a callback URL. The HS has access to the job
> conf, but the HS is currently a 'passive' service.****
>
>
> thx****
>
>  ****
>
> On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy <[EMAIL PROTECTED]>
> wrote:****
>
> Prashanth, ****
>
>  ****
>
>  Please file a jira.****
>
>  ****
>
>  One thing to be aware of - AMs get restarted a certain number of times
> for fault-tolerance - which means we can't just assume that failure of a
> single AM is equivalent to failure of the job.****
>
>  ****
>
>  Only the ResourceManager is in the appropriate position to judge failure
> of AM v/s failure-of-job.****
>
>  ****
>
> hth,****
>
> Arun****
>
>  ****
>
> On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi <[EMAIL PROTECTED]>
> wrote:****
Alejandro
+
Devaraj k 2013-06-25, 13:11
+
Alejandro Abdelnur 2013-06-25, 13:21
+
Prashant Kommireddi 2013-06-22, 21:44