-Re: Delays in worker node jobs
Terry Healy 2012-08-30, 01:20
Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.
I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.
On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage.
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestamps of idleness on the nodes with the job-load at that point of time.
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>> Running 1.0.2, in this case on Linux.
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>> What could be causing this?
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
Terry Healy / [EMAIL PROTECTED]
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973