|
|
-
Jobs randomly not starting
Robert Dyer 2012-07-13, 04:03
I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2 compute nodes). My input size is a sequence file of around 280mb.
Generally, my jobs run just fine and all finish in 2-5 minutes. However, quite randomly the jobs refuse to run. They submit and appear when running 'hadoop job -list' but don't appear on the jobtracker's webpage. If I manually type in the job ID on the webpage I can see it is trying to run the setup task - the map tasks haven't even started. I've left them to run and even after several minutes it is still in this state.
When I spot this, I kill the job and resubmit it and generally it works.
A couple of times I have seen similar problems with reduce tasks that get stuck while 'initializing'.
Any ideas?
-
Re: Jobs randomly not starting
Bejoy KS 2012-07-13, 04:38
Hi Robert
It could be because there are no free slots available in your cluster during job submission time to launch those tasks. Some other tasks may have already occupied the map/reduce slots.
When you experience this random issue please verify whether there are free task slots available.
Regards Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message----- From: Robert Dyer <[EMAIL PROTECTED]> Date: Thu, 12 Jul 2012 23:03:02 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Jobs randomly not starting
I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2 compute nodes). My input size is a sequence file of around 280mb.
Generally, my jobs run just fine and all finish in 2-5 minutes. However, quite randomly the jobs refuse to run. They submit and appear when running 'hadoop job -list' but don't appear on the jobtracker's webpage. If I manually type in the job ID on the webpage I can see it is trying to run the setup task - the map tasks haven't even started. I've left them to run and even after several minutes it is still in this state.
When I spot this, I kill the job and resubmit it and generally it works.
A couple of times I have seen similar problems with reduce tasks that get stuck while 'initializing'.
Any ideas?
-
Re: Jobs randomly not starting
Harsh J 2012-07-13, 06:04
Hey Robert,
Any chance you can pastebin the JT logs, grepped for the bad job ID, and send the link across? They shouldn't hang the way you describe.
On Fri, Jul 13, 2012 at 9:33 AM, Robert Dyer <[EMAIL PROTECTED]> wrote: > I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2 > compute nodes). My input size is a sequence file of around 280mb. > > Generally, my jobs run just fine and all finish in 2-5 minutes. However, > quite randomly the jobs refuse to run. They submit and appear when running > 'hadoop job -list' but don't appear on the jobtracker's webpage. If I > manually type in the job ID on the webpage I can see it is trying to run the > setup task - the map tasks haven't even started. I've left them to run and > even after several minutes it is still in this state. > > When I spot this, I kill the job and resubmit it and generally it works. > > A couple of times I have seen similar problems with reduce tasks that get > stuck while 'initializing'. > > Any ideas? >
-- Harsh J
-
Re: Jobs randomly not starting
Robert Dyer 2012-07-17, 20:27
Upon further inspection of that log, it appears the problem is the startup task just takes a very long time.
Typically it is taking at most 6 seconds, but sometimes (the cases I think its hanging) it actually runs and finishes but takes 3-5 minutes.
Same problem with the cleanup (which is where I thought the reduce was getting stuck).
I am currently the only user on this cluster and I never have more than 1 job in the queue at a time.
Ideas?
On Fri, Jul 13, 2012 at 1:04 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hey Robert, > > Any chance you can pastebin the JT logs, grepped for the bad job ID, > and send the link across? They shouldn't hang the way you describe. > > On Fri, Jul 13, 2012 at 9:33 AM, Robert Dyer <[EMAIL PROTECTED]> wrote: > > I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2 > > compute nodes). My input size is a sequence file of around 280mb. > > > > Generally, my jobs run just fine and all finish in 2-5 minutes. However, > > quite randomly the jobs refuse to run. They submit and appear when > running > > 'hadoop job -list' but don't appear on the jobtracker's webpage. If I > > manually type in the job ID on the webpage I can see it is trying to run > the > > setup task - the map tasks haven't even started. I've left them to run > and > > even after several minutes it is still in this state. > > > > When I spot this, I kill the job and resubmit it and generally it works. > > > > A couple of times I have seen similar problems with reduce tasks that get > > stuck while 'initializing'. > > > > Any ideas? > > >
|
|