-Re: Jobs are still in running state after executing "hadoop job -kill jobId"
Edward Capriolo 2011-07-05, 17:29
On Tue, Jul 5, 2011 at 11:45 AM, Juwei Shi <[EMAIL PROTECTED]> wrote:
> We sometimes have hundreds of map or reduce tasks for a job. I think it is
> hard to find all of them and kill the corresponding jvm processes. If we do
> not want to restart hadoop, is there any automatic methods?
> 2011/7/5 <[EMAIL PROTECTED]>
> > Um kill -9 "pid" ?
> > -----Original Message-----
> > From: Juwei Shi [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, July 01, 2011 10:53 AM
> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> > Subject: Jobs are still in running state after executing "hadoop job
> > -kill jobId"
> > Hi,
> > I faced a problem that the jobs are still running after executing
> > "hadoop
> > job -kill jobId". I rebooted the cluster but the job still can not be
> > killed.
> > The hadoop version is 0.20.2.
> > Any idea?
> > Thanks in advance!
> > --
> > - Juwei
I do not think they pop up very often but after days and months of running a
orphans can be alive. The way I would handle it is write a check that runs
over Nagios (NRPE) and looks for Hadoop task processes using ps, that are
older then a certain age such as 1 day or 1 week etc. Then you can decide if
want nagios to terminate these orphans or do it by hand.