|
|
-
Delays in worker node jobs
Terry Healy 2012-08-29, 13:40
Running 1.0.2, in this case on Linux.
I was watching the processes / loads on one TaskTracker instance and noticed that it completed it's first 8 map tasks and reported 8 free slots (the max for this system). It then waited doing nothing for more than 30 seconds before the next "batch" of work came in and started running.
Likewise it also has relatively long periods with all 8 cores running at or near idle. There are no jobs failing or obvious errors in the TaskTracker log.
What could be causing this?
Should I increase the number of map jobs to greater than number of cores to try and keep it busier?
-Terry
+
Terry Healy 2012-08-29, 13:40
-
Re: Delays in worker node jobs
Harsh J 2012-08-29, 16:01
Hey Terry,
Can you look at your JobTracker logs, grep it for this worker node's hostname and see the task assignment timestamps vs. when the task began in real (from the TaskTracker log, grepping for the same attempt ID)?
On Wed, Aug 29, 2012 at 7:10 PM, Terry Healy <[EMAIL PROTECTED]> wrote: > Running 1.0.2, in this case on Linux. > > I was watching the processes / loads on one TaskTracker instance and > noticed that it completed it's first 8 map tasks and reported 8 free > slots (the max for this system). It then waited doing nothing for more > than 30 seconds before the next "batch" of work came in and started running. > > Likewise it also has relatively long periods with all 8 cores running at > or near idle. There are no jobs failing or obvious errors in the > TaskTracker log. > > What could be causing this? > > Should I increase the number of map jobs to greater than number of cores > to try and keep it busier? > > -Terry
-- Harsh J
+
Harsh J 2012-08-29, 16:01
-
Re: Delays in worker node jobs
Terry Healy 2012-08-30, 01:20
Thanks guys. Unfortunately I had started the datanode by local command rather than from start-all.sh, so the related parts of the logs were lost. I was watching the cpu loads on all 8 cores via gkrellm at the time and they were definitely quiet. After a few minutes the jobs seemed to get in sync and it ran under a reasonable load (i.e. all cores mostly busy, with only brief gaps between tasks) for the rest of the job.
I will attempt to re-create tomorrow with proper logging. I will look into enabling Hadoop metrics.
-Terry
On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote: > Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage. > > Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time. > > HTH, > +vinod > > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote: > >> Running 1.0.2, in this case on Linux. >> >> I was watching the processes / loads on one TaskTracker instance and >> noticed that it completed it's first 8 map tasks and reported 8 free >> slots (the max for this system). It then waited doing nothing for more >> than 30 seconds before the next "batch" of work came in and started running. >> >> Likewise it also has relatively long periods with all 8 cores running at >> or near idle. There are no jobs failing or obvious errors in the >> TaskTracker log. >> >> What could be causing this? >> >> Should I increase the number of map jobs to greater than number of cores >> to try and keep it busier? >> >> -Terry
-- Terry Healy / [EMAIL PROTECTED] Cyber Security Operations Brookhaven National Laboratory Building 515, Upton N.Y. 11973
+
Terry Healy 2012-08-30, 01:20
-
Re: Delays in worker node jobs
Steve Loughran 2012-08-30, 11:49
if you increase the rate of TT heartbeating to the Job Tracker, they may pick up work more often.
The JT only hands out work when either of -the TT reports a task completion -the TT heartbeats in
This is a design that scales well for large clusters, but can add startup latency for small ones
steve
On 30 August 2012 02:20, Terry Healy <[EMAIL PROTECTED]> wrote:
> Thanks guys. Unfortunately I had started the datanode by local command > rather than from start-all.sh, so the related parts of the logs were > lost. I was watching the cpu loads on all 8 cores via gkrellm at the > time and they were definitely quiet. After a few minutes the jobs seemed > to get in sync and it ran under a reasonable load (i.e. all cores mostly > busy, with only brief gaps between tasks) for the rest of the job. > > I will attempt to re-create tomorrow with proper logging. I will look > into enabling Hadoop metrics. > > -Terry > > > > On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote: > > Do you know if you have enough job-load on the system? One way to look > at this is to look for running map/reduce tasks on the JT UI at the same > time you are looking at the node's cpu usage. > > > > Collecting hadoop metrics via a metrics collection system say ganglia > will let you match up the timestamps of idleness on the nodes with the > job-load at that point of time. > > > > HTH, > > +vinod > > > > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote: > > > >> Running 1.0.2, in this case on Linux. > >> > >> I was watching the processes / loads on one TaskTracker instance and > >> noticed that it completed it's first 8 map tasks and reported 8 free > >> slots (the max for this system). It then waited doing nothing for more > >> than 30 seconds before the next "batch" of work came in and started > running. > >> > >> Likewise it also has relatively long periods with all 8 cores running at > >> or near idle. There are no jobs failing or obvious errors in the > >> TaskTracker log. > >> > >> What could be causing this? > >> > >> Should I increase the number of map jobs to greater than number of cores > >> to try and keep it busier? > >> > >> -Terry > > -- > Terry Healy / [EMAIL PROTECTED] > Cyber Security Operations > Brookhaven National Laboratory > Building 515, Upton N.Y. 11973 > > > >
+
Steve Loughran 2012-08-30, 11:49
|
|