|
|
-
how to query JobTracker
Some Body 2010-06-17, 12:53
Hi All,
What are the steps to query the cluster for running jobs with a particular JobName? My driver class always submits my job with a preset name. Job job = new Job(config, "My Job Name"); ...... return job.waitForCompletion(true) ? 0 : 1;
I want to setup a cron to trigger the job submission and I want to ensure only 1 instance of my job is running. Surely I could do this via a shell wrapper, but I'd rather implement it in my driver class. i.e. getAllJobs from the JobTracker, check for "My Job Name", and kill the old job before submitting a new job.
I'm using (cloudera's) hadoop 0.20.2+228
Thanks, Alan
-
Re: how to query JobTracker
Jeff Zhang 2010-06-17, 14:42
Use JobClient.submitJob(JobConf job) , this method will return RunningJob. Then you can call RunningJob.isComplete() to query whether the previous job has been done.
On Thu, Jun 17, 2010 at 5:53 AM, Some Body <[EMAIL PROTECTED]> wrote: > Hi All, > > What are the steps to query the cluster for running jobs with a particular JobName? > My driver class always submits my job with a preset name. > Job job = new Job(config, "My Job Name"); > ...... > return job.waitForCompletion(true) ? 0 : 1; > > I want to setup a cron to trigger the job submission and I want to ensure only 1 instance of my job is running. > Surely I could do this via a shell wrapper, but I'd rather implement it in my driver class. > i.e. getAllJobs from the JobTracker, check for "My Job Name", and kill the old job before submitting a new job. > > I'm using (cloudera's) hadoop 0.20.2+228 > > Thanks, > Alan >
-- Best Regards
Jeff Zhang
-
Re: how to query JobTracker
Sanel Zukan 2010-06-17, 14:49
AFAIK, there is no such method (to get a job name from client side) :( (at least I wasn't able to find it). Via JobProfile can be extracted job name via given id, but only JobTracker can access it (if you try to instantiate it, you will start own job tracker).
The only solution is to directly query things via job id, received when job was started.
On Thu, Jun 17, 2010 at 2:53 PM, Some Body <[EMAIL PROTECTED]> wrote: > Hi All, > > What are the steps to query the cluster for running jobs with a particular JobName? > My driver class always submits my job with a preset name. > Job job = new Job(config, "My Job Name"); > ...... > return job.waitForCompletion(true) ? 0 : 1; > > I want to setup a cron to trigger the job submission and I want to ensure only 1 instance of my job is running. > Surely I could do this via a shell wrapper, but I'd rather implement it in my driver class. > i.e. getAllJobs from the JobTracker, check for "My Job Name", and kill the old job before submitting a new job. > > I'm using (cloudera's) hadoop 0.20.2+228 > > Thanks, > Alan >
-
Re: Re: how to query JobTracker
Some Body 2010-06-17, 15:12
Thanks Sanel,
Assuming my driver class would always use a "custom" job ID like "MyCustomJob" instead of "job_<YYYYMMDDHHMM>_<nnnn>" e.g. job_201006171232_0004, which is the default, how would I then query for the jobID?
Seems like it might just be easier to have my driver class submit the job, write the jobid to a lock file (hdfs://myapp/myjob.lock), and then a. remove the lock file when the job finishes, or b. if a new job is triggered before the first finished, read the jobid from the lock file kill the previous job, and start a new one
Alan ----- original message --------
Subject: Re: how to query JobTracker Sent: Thu, 17 Jun 2010 From: Sanel Zukan<[EMAIL PROTECTED]>
> AFAIK, there is no such method (to get a job name from client side) :( > (at least I wasn't able to find it). Via JobProfile can be > extracted job name via given id, but only JobTracker can access it (if > you try to instantiate it, you will start own job tracker). > > The only solution is to directly query things via job id, received > when job was started. > > On Thu, Jun 17, 2010 at 2:53 PM, Some Body <[EMAIL PROTECTED]> > wrote: > > Hi All, > > > > What are the steps to query the cluster for running jobs with a particular > JobName? > > My driver class always submits my job with a preset name. > > Job job = new Job(config, "My Job Name"); > > ...... > > return job.waitForCompletion(true) ? 0 : 1; > > > > I want to setup a cron to trigger the job submission and I want to ensure > only 1 instance of my job is running. > > Surely I could do this via a shell wrapper, but I'd rather implement it in > my driver class. > > i.e. getAllJobs from the JobTracker, check for "My Job Name", and kill the > old job before submitting a new job. > > > > I'm using (cloudera's) hadoop 0.20.2+228 > > > > Thanks, > > Alan > > >
--- original message end ----
-
Re: Re: how to query JobTracker
Sanel Zukan 2010-06-17, 16:07
JobClient is able to directly connect to job tracker address (see JobTracker constructor with InetSocketAddress parameter). After that, getAllJobs() will return known jobs and you will able to find your job id there.
I would go with similar solution (with proposed one): write some lock with job id and on second job start, fetch currently running jobs, find my id, check if is running and decide what to do next.
PS: I'm not sure you will able to construct custom job id from client side ;) On Thu, Jun 17, 2010 at 5:12 PM, Some Body <[EMAIL PROTECTED]> wrote: > Thanks Sanel, > > Assuming my driver class would always use a "custom" job ID like > "MyCustomJob" instead of "job_<YYYYMMDDHHMM>_<nnnn>" e.g. job_201006171232_0004, > which is the default, how would I then query for the jobID? > > Seems like it might just be easier to have my driver class > submit the job, write the jobid to a lock file (hdfs://myapp/myjob.lock), and then > a. remove the lock file when the job finishes, or > b. if a new job is triggered before the first finished, read the jobid from the lock file > kill the previous job, and start a new one > > Alan > > > ----- original message -------- > > Subject: Re: how to query JobTracker > Sent: Thu, 17 Jun 2010 > From: Sanel Zukan<[EMAIL PROTECTED]> > >> AFAIK, there is no such method (to get a job name from client side) :( >> (at least I wasn't able to find it). Via JobProfile can be >> extracted job name via given id, but only JobTracker can access it (if >> you try to instantiate it, you will start own job tracker). >> >> The only solution is to directly query things via job id, received >> when job was started. >> >> On Thu, Jun 17, 2010 at 2:53 PM, Some Body <[EMAIL PROTECTED]> >> wrote: >> > Hi All, >> > >> > What are the steps to query the cluster for running jobs with a particular >> JobName? >> > My driver class always submits my job with a preset name. >> > Job job = new Job(config, "My Job Name"); >> > ...... >> > return job.waitForCompletion(true) ? 0 : 1; >> > >> > I want to setup a cron to trigger the job submission and I want to ensure >> only 1 instance of my job is running. >> > Surely I could do this via a shell wrapper, but I'd rather implement it in >> my driver class. >> > i.e. getAllJobs from the JobTracker, check for "My Job Name", and kill the >> old job before submitting a new job. >> > >> > I'm using (cloudera's) hadoop 0.20.2+228 >> > >> > Thanks, >> > Alan >> > >> > > --- original message end ---- > >
|
|