-Re: Delays in worker node jobs
Steve Loughran 2012-08-30, 11:49
if you increase the rate of TT heartbeating to the Job Tracker, they may
pick up work more often.
The JT only hands out work when either of
-the TT reports a task completion
-the TT heartbeats in
This is a design that scales well for large clusters, but can add startup
latency for small ones
On 30 August 2012 02:20, Terry Healy <[EMAIL PROTECTED]> wrote:
> Thanks guys. Unfortunately I had started the datanode by local command
> rather than from start-all.sh, so the related parts of the logs were
> lost. I was watching the cpu loads on all 8 cores via gkrellm at the
> time and they were definitely quiet. After a few minutes the jobs seemed
> to get in sync and it ran under a reasonable load (i.e. all cores mostly
> busy, with only brief gaps between tasks) for the rest of the job.
> I will attempt to re-create tomorrow with proper logging. I will look
> into enabling Hadoop metrics.
> On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> > Do you know if you have enough job-load on the system? One way to look
> at this is to look for running map/reduce tasks on the JT UI at the same
> time you are looking at the node's cpu usage.
> > Collecting hadoop metrics via a metrics collection system say ganglia
> will let you match up the timestamps of idleness on the nodes with the
> job-load at that point of time.
> > HTH,
> > +vinod
> > On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
> >> Running 1.0.2, in this case on Linux.
> >> I was watching the processes / loads on one TaskTracker instance and
> >> noticed that it completed it's first 8 map tasks and reported 8 free
> >> slots (the max for this system). It then waited doing nothing for more
> >> than 30 seconds before the next "batch" of work came in and started
> >> Likewise it also has relatively long periods with all 8 cores running at
> >> or near idle. There are no jobs failing or obvious errors in the
> >> TaskTracker log.
> >> What could be causing this?
> >> Should I increase the number of map jobs to greater than number of cores
> >> to try and keep it busier?
> >> -Terry
> Terry Healy / [EMAIL PROTECTED]
> Cyber Security Operations
> Brookhaven National Laboratory
> Building 515, Upton N.Y. 11973