Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Jobtracker page hangs ..again.


Copy link to this message
-
Re: Jobtracker page hangs ..again.
Appreciate your input Bryant, i will try to reproduce and see the namenode
log before, while, and after it pause.
Wish me luck
On Fri, Aug 9, 2013 at 2:09 PM, Bryan Beaudreault
<[EMAIL PROTECTED]>wrote:

> When I've had problems with a slow jobtracker, i've found the issue to be
> one of the following two (so far) possibilities:
>
> - long GC pause (I'm guessing this is not it based on your email)
> - hdfs is slow
>
> I haven't dived into the code yet, but circumstantially I've found that
> when you submit a job the jobtracker needs to put a bunch of files in hdfs,
> such as the job.xml, the job jar, etc.  I'm not sure how this scales with
> larger and larger jobs, aside form the size of the splits serialization in
> the job.xml, but if your HDFS is slow for any reason it can cause pauses in
> your jobtracker.  This affects other jobs being able to submit, as well as
> the 50030 web ui.
>
> I'd take a look at your namenode logs.  When the jobtracker logs pause, do
> you see a corresponding pause in the namenode logs?  What gets spewed
> before and after that pause?
>
>
> On Fri, Aug 9, 2013 at 4:41 PM, Patai Sangbutsarakum <
> [EMAIL PROTECTED]> wrote:
>
>> A while back, i was fighting with the jobtracker page hangs when i browse
>> to http://jobtracker:50030 browser doesn't show jobs info as usual which
>> ends up because of allowing too much job history kept in jobtracker.
>>
>> Currently, i am setting up a new cluster 40g heap on the namenode and
>> jobtracker in dedicated machines. Not fun part starts here; a developer
>> tried to test out the cluster by launching a 76k map job (the cluster has
>> around 6k-ish mappers)
>> Job initialization was success, and finished the job.
>>
>> However, before the job is actually running, i can't access to the
>> jobtracker page anymore same symptom as above.
>>
>> i see bunch of this in jobtracker log
>>
>> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress:
>> tip:task_201307291733_0619_m_076796 has split on node: /rack/node
>> ..
>> ..
>> ..
>>
>> Until i see this
>>
>> INFO org.apache.hadoop.mapred.JobInProgress: job_201307291733_0619
>> LOCALITY_WAIT_FACTOR=1.0
>> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress: Job
>> job_201307291733_0619 initialized successfully with 76797 map tasks and 10
>> reduce tasks.
>>
>> that's when i can access to the jobtracker page again.
>>
>>
>> CPU on jobtracker is very little load, JTK's Heap is far from full like
>> 1ish gig from 40
>> network bandwidth is far from filled up.
>>
>> I'm running on 0.20.2 branch on CentOS6.4 with Java(TM) SE Runtime
>> Environment (build 1.6.0_32-b05)
>>
>>
>> What would be the root cause i should looking at or at least where to
>> start?
>>
>> Thanks you in advanced
>>
>>
>>
>>
>