|
|
Mohit Anchlia 2012-12-22, 20:30
What's the best way to trigger alert when jobs run for too long or have many failures? Is there a hadoop command that can be used to perform this activity?
+
Mohit Anchlia 2012-12-22, 20:30
Mohit Anchlia 2012-12-22, 20:39
Best I can find is hadoop job list so far
On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> What's the best way to trigger alert when jobs run for too long or have > many failures? Is there a hadoop command that can be used to perform this > activity?
+
Mohit Anchlia 2012-12-22, 20:39
Mohammad Tariq 2012-12-22, 20:44
MR web UI?Although we can't trigger anything, it provides all the info related to the jobs. I mean it would be easier to just go there and and have a look at everything rather than opening the shell and typing the command. I'm a bit lazy ;) Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Best I can find is hadoop job list so far > > > On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> What's the best way to trigger alert when jobs run for too long or have >> many failures? Is there a hadoop command that can be used to perform this >> activity? > > >
+
Mohammad Tariq 2012-12-22, 20:44
Mohit Anchlia 2012-12-22, 20:49
Need alerting On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > MR web UI?Although we can't trigger anything, it provides all the info > related to the jobs. I mean it would be easier to just go there and and > have a look at everything rather than opening the shell and typing the > command. > > I'm a bit lazy ;) > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/> > > On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> Best I can find is hadoop job list so far >> >> >> On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >> >>> What's the best way to trigger alert when jobs run for too long or have >>> many failures? Is there a hadoop command that can be used to perform this >>> activity? >> >> >> >
+
Mohit Anchlia 2012-12-22, 20:49
Nitin Pawar 2012-12-22, 20:52
you may just add an alert via email to your workflow for the failure you can try the retry with # feature tries and then send alert of job failures (we used this for jobs running for over 5 hrs and worked well for us) On Sun, Dec 23, 2012 at 2:19 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Need alerting > > > On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: > >> MR web UI?Although we can't trigger anything, it provides all the info >> related to the jobs. I mean it would be easier to just go there and and >> have a look at everything rather than opening the shell and typing the >> command. >> >> I'm a bit lazy ;) >> >> Best Regards, >> Tariq >> +91-9741563634 >> https://mtariq.jux.com/>> >> >> On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >> >>> Best I can find is hadoop job list so far >>> >>> >>> On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >>> >>>> What's the best way to trigger alert when jobs run for too long or have >>>> many failures? Is there a hadoop command that can be used to perform this >>>> activity? >>> >>> >>> >> > -- Nitin Pawar
+
Nitin Pawar 2012-12-22, 20:52
Ted Dunning 2012-12-22, 22:08
You can write a script to parse the Hadoop job list and send an alert. The trick of putting a retry into your workflow system is a nice one. If your program won't allow multiple copies to run at the same time, then if you re-invoke the program every, say, hour, then 5 retries implies that the previous invocation has been running for 5 hours. On Sat, Dec 22, 2012 at 12:49 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Need alerting > > > On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: > >> MR web UI?Although we can't trigger anything, it provides all the info >> related to the jobs. I mean it would be easier to just go there and and >> have a look at everything rather than opening the shell and typing the >> command. >> >> I'm a bit lazy ;) >> >> Best Regards, >> Tariq >> +91-9741563634 >> https://mtariq.jux.com/>> >> >> On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >> >>> Best I can find is hadoop job list so far >>> >>> >>> On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >>> >>>> What's the best way to trigger alert when jobs run for too long or have >>>> many failures? Is there a hadoop command that can be used to perform this >>>> activity? >>> >>> >>> >> >
+
Ted Dunning 2012-12-22, 22:08
Ted Dunning 2012-12-22, 22:12
Also, I think that Oozie allows for timeouts in job submission. That might answer your need. On Sat, Dec 22, 2012 at 2:08 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > You can write a script to parse the Hadoop job list and send an alert. > > The trick of putting a retry into your workflow system is a nice one. If > your program won't allow multiple copies to run at the same time, then if > you re-invoke the program every, say, hour, then 5 retries implies that the > previous invocation has been running for 5 hours. > > > On Sat, Dec 22, 2012 at 12:49 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> Need alerting >> >> >> On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote: >> >>> MR web UI?Although we can't trigger anything, it provides all the info >>> related to the jobs. I mean it would be easier to just go there and and >>> have a look at everything rather than opening the shell and typing the >>> command. >>> >>> I'm a bit lazy ;) >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/>>> >>> >>> On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >>> >>>> Best I can find is hadoop job list so far >>>> >>>> >>>> On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED] >>>> > wrote: >>>> >>>>> What's the best way to trigger alert when jobs run for too long or >>>>> have many failures? Is there a hadoop command that can be used to perform >>>>> this activity? >>>> >>>> >>>> >>> >> >
+
Ted Dunning 2012-12-22, 22:12
Marcin Mejran 2012-12-23, 16:08
Yeah, oozie sounds like the best approach. I think "timeout" in Oozie refers to something different (stopping a coordinator if it hasn't started within X minutes) but the SLA mechanism should do what's asked for. -Marcin From: Ted Dunning [mailto:[EMAIL PROTECTED]] Sent: Saturday, December 22, 2012 5:12 PM To: [EMAIL PROTECTED] Subject: Re: Alerting Also, I think that Oozie allows for timeouts in job submission. That might answer your need. On Sat, Dec 22, 2012 at 2:08 PM, Ted Dunning <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: You can write a script to parse the Hadoop job list and send an alert. The trick of putting a retry into your workflow system is a nice one. If your program won't allow multiple copies to run at the same time, then if you re-invoke the program every, say, hour, then 5 retries implies that the previous invocation has been running for 5 hours. On Sat, Dec 22, 2012 at 12:49 PM, Mohit Anchlia <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Need alerting On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: MR web UI?Although we can't trigger anything, it provides all the info related to the jobs. I mean it would be easier to just go there and and have a look at everything rather than opening the shell and typing the command. I'm a bit lazy ;) Best Regards, Tariq +91-9741563634<tel:%2B91-9741563634> https://mtariq.jux.com/On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Best I can find is hadoop job list so far On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: What's the best way to trigger alert when jobs run for too long or have many failures? Is there a hadoop command that can be used to perform this activity?
+
Marcin Mejran 2012-12-23, 16:08
Junior Mint 2012-12-23, 23:58
who can tell me ,how to unscript this maillist.... On Mon, Dec 24, 2012 at 12:08 AM, Marcin Mejran <[EMAIL PROTECTED] > wrote: > Yeah, oozie sounds like the best approach. I think “timeout” in Oozie > refers to something different (stopping a coordinator if it hasn’t started > within X minutes) but the SLA mechanism should do what’s asked for.**** > > ** ** > > -Marcin**** > > ** ** > > *From:* Ted Dunning [mailto:[EMAIL PROTECTED]] > *Sent:* Saturday, December 22, 2012 5:12 PM > *To:* [EMAIL PROTECTED] > *Subject:* Re: Alerting**** > > ** ** > > Also, I think that Oozie allows for timeouts in job submission. That > might answer your need.**** > > ** ** > > ** ** > > ** ** > > On Sat, Dec 22, 2012 at 2:08 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote:**** > > You can write a script to parse the Hadoop job list and send an alert.**** > > ** ** > > The trick of putting a retry into your workflow system is a nice one. If > your program won't allow multiple copies to run at the same time, then if > you re-invoke the program every, say, hour, then 5 retries implies that the > previous invocation has been running for 5 hours.**** > > ** ** > > On Sat, Dec 22, 2012 at 12:49 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote:**** > > Need alerting**** > > ** ** > > On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote:**** > > MR web UI?Although we can't trigger anything, it provides all the info > related to the jobs. I mean it would be easier to just go there and and > have a look at everything rather than opening the shell and typing the > command. **** > > ** ** > > I'm a bit lazy ;)**** > > > **** > > Best Regards, **** > > Tariq**** > > +91-9741563634**** > > https://mtariq.jux.com/****> > ** ** > > On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote:**** > > Best I can find is hadoop job list so far **** > > ** ** > > On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote:**** > > What's the best way to trigger alert when jobs run for too long or have > many failures? Is there a hadoop command that can be used to perform this > activity? **** > > ** ** > > ** ** > > ** ** > > ** ** > > ** ** >
+
Junior Mint 2012-12-23, 23:58
Mohammad Tariq 2012-12-24, 07:58
What have you tried? Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/On Mon, Dec 24, 2012 at 5:28 AM, Junior Mint <[EMAIL PROTECTED]>wrote: > who can tell me ,how to unscript this maillist.... > > > On Mon, Dec 24, 2012 at 12:08 AM, Marcin Mejran < > [EMAIL PROTECTED]> wrote: > >> Yeah, oozie sounds like the best approach. I think “timeout” in Oozie >> refers to something different (stopping a coordinator if it hasn’t started >> within X minutes) but the SLA mechanism should do what’s asked for.**** >> >> ** ** >> >> -Marcin**** >> >> ** ** >> >> *From:* Ted Dunning [mailto:[EMAIL PROTECTED]] >> *Sent:* Saturday, December 22, 2012 5:12 PM >> *To:* [EMAIL PROTECTED] >> *Subject:* Re: Alerting**** >> >> ** ** >> >> Also, I think that Oozie allows for timeouts in job submission. That >> might answer your need.**** >> >> ** ** >> >> ** ** >> >> ** ** >> >> On Sat, Dec 22, 2012 at 2:08 PM, Ted Dunning <[EMAIL PROTECTED]> >> wrote:**** >> >> You can write a script to parse the Hadoop job list and send an alert.*** >> * >> >> ** ** >> >> The trick of putting a retry into your workflow system is a nice one. If >> your program won't allow multiple copies to run at the same time, then if >> you re-invoke the program every, say, hour, then 5 retries implies that the >> previous invocation has been running for 5 hours.**** >> >> ** ** >> >> On Sat, Dec 22, 2012 at 12:49 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote:**** >> >> Need alerting**** >> >> ** ** >> >> On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[EMAIL PROTECTED]> >> wrote:**** >> >> MR web UI?Although we can't trigger anything, it provides all the info >> related to the jobs. I mean it would be easier to just go there and and >> have a look at everything rather than opening the shell and typing the >> command. **** >> >> ** ** >> >> I'm a bit lazy ;)**** >> >> >> **** >> >> Best Regards, **** >> >> Tariq**** >> >> +91-9741563634**** >> >> https://mtariq.jux.com/****>> >> ** ** >> >> On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote:**** >> >> Best I can find is hadoop job list so far **** >> >> ** ** >> >> On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote:**** >> >> What's the best way to trigger alert when jobs run for too long or have >> many failures? Is there a hadoop command that can be used to perform this >> activity? **** >> >> ** ** >> >> ** ** >> >> ** ** >> >> ** ** >> >> ** ** >> > >
+
Mohammad Tariq 2012-12-24, 07:58
|
|