|
|
-
RE: Jobs are still in running state after executing "hadoop job -kill jobId"
Jeff.Schmitz@... 2011-07-05, 14:05
Um kill -9 "pid" ?
-----Original Message----- From: Juwei Shi [mailto:[EMAIL PROTECTED]] Sent: Friday, July 01, 2011 10:53 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Jobs are still in running state after executing "hadoop job -kill jobId"
Hi,
I faced a problem that the jobs are still running after executing "hadoop job -kill jobId". I rebooted the cluster but the job still can not be killed.
The hadoop version is 0.20.2.
Any idea?
Thanks in advance!
-- - Juwei
-
Re: Jobs are still in running state after executing "hadoop job -kill jobId"
Edward Capriolo 2011-07-05, 14:50
On Tue, Jul 5, 2011 at 10:05 AM, <[EMAIL PROTECTED]> wrote:
> Um kill -9 "pid" ? > > -----Original Message----- > From: Juwei Shi [mailto:[EMAIL PROTECTED]] > Sent: Friday, July 01, 2011 10:53 AM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Jobs are still in running state after executing "hadoop job > -kill jobId" > > Hi, > > I faced a problem that the jobs are still running after executing > "hadoop > job -kill jobId". I rebooted the cluster but the job still can not be > killed. > > The hadoop version is 0.20.2. > > Any idea? > > Thanks in advance! > > -- > - Juwei > > This happens sometimes. A task gets orphaned from the Task Tracker and never goes away. It is a good idea to have a nagios check for very old tasks because the orphans slowly such your memory away especially if the task launches with a big Xmx. You really *should not* need to be nuking tasks like this but occasionally it happens.
Edward
-
Re: Jobs are still in running state after executing "hadoop job -kill jobId"
Edward Capriolo 2011-07-05, 17:29
On Tue, Jul 5, 2011 at 11:45 AM, Juwei Shi <[EMAIL PROTECTED]> wrote:
> We sometimes have hundreds of map or reduce tasks for a job. I think it is > hard to find all of them and kill the corresponding jvm processes. If we do > not want to restart hadoop, is there any automatic methods? > > 2011/7/5 <[EMAIL PROTECTED]> > > > Um kill -9 "pid" ? > > > > -----Original Message----- > > From: Juwei Shi [mailto:[EMAIL PROTECTED]] > > Sent: Friday, July 01, 2011 10:53 AM > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > Subject: Jobs are still in running state after executing "hadoop job > > -kill jobId" > > > > Hi, > > > > I faced a problem that the jobs are still running after executing > > "hadoop > > job -kill jobId". I rebooted the cluster but the job still can not be > > killed. > > > > The hadoop version is 0.20.2. > > > > Any idea? > > > > Thanks in advance! > > > > -- > > - Juwei > > > > >
I do not think they pop up very often but after days and months of running a orphans can be alive. The way I would handle it is write a check that runs over Nagios (NRPE) and looks for Hadoop task processes using ps, that are older then a certain age such as 1 day or 1 week etc. Then you can decide if want nagios to terminate these orphans or do it by hand.
Edward
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext