Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Jobtracker page hangs ..again.


Copy link to this message
-
Re: Jobtracker page hangs ..again.
Appreciate your input Bryant, i will try to reproduce and see the namenode
log before, while, and after it pause.
Wish me luck
On Fri, Aug 9, 2013 at 2:09 PM, Bryan Beaudreault
<[EMAIL PROTECTED]>wrote:

> When I've had problems with a slow jobtracker, i've found the issue to be
> one of the following two (so far) possibilities:
>
> - long GC pause (I'm guessing this is not it based on your email)
> - hdfs is slow
>
> I haven't dived into the code yet, but circumstantially I've found that
> when you submit a job the jobtracker needs to put a bunch of files in hdfs,
> such as the job.xml, the job jar, etc.  I'm not sure how this scales with
> larger and larger jobs, aside form the size of the splits serialization in
> the job.xml, but if your HDFS is slow for any reason it can cause pauses in
> your jobtracker.  This affects other jobs being able to submit, as well as
> the 50030 web ui.
>
> I'd take a look at your namenode logs.  When the jobtracker logs pause, do
> you see a corresponding pause in the namenode logs?  What gets spewed
> before and after that pause?
>
>
> On Fri, Aug 9, 2013 at 4:41 PM, Patai Sangbutsarakum <
> [EMAIL PROTECTED]> wrote:
>
>> A while back, i was fighting with the jobtracker page hangs when i browse
>> to http://jobtracker:50030 browser doesn't show jobs info as usual which
>> ends up because of allowing too much job history kept in jobtracker.
>>
>> Currently, i am setting up a new cluster 40g heap on the namenode and
>> jobtracker in dedicated machines. Not fun part starts here; a developer
>> tried to test out the cluster by launching a 76k map job (the cluster has
>> around 6k-ish mappers)
>> Job initialization was success, and finished the job.
>>
>> However, before the job is actually running, i can't access to the
>> jobtracker page anymore same symptom as above.
>>
>> i see bunch of this in jobtracker log
>>
>> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress:
>> tip:task_201307291733_0619_m_076796 has split on node: /rack/node
>> ..
>> ..
>> ..
>>
>> Until i see this
>>
>> INFO org.apache.hadoop.mapred.JobInProgress: job_201307291733_0619
>> LOCALITY_WAIT_FACTOR=1.0
>> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress: Job
>> job_201307291733_0619 initialized successfully with 76797 map tasks and 10
>> reduce tasks.
>>
>> that's when i can access to the jobtracker page again.
>>
>>
>> CPU on jobtracker is very little load, JTK's Heap is far from full like
>> 1ish gig from 40
>> network bandwidth is far from filled up.
>>
>> I'm running on 0.20.2 branch on CentOS6.4 with Java(TM) SE Runtime
>> Environment (build 1.6.0_32-b05)
>>
>>
>> What would be the root cause i should looking at or at least where to
>> start?
>>
>> Thanks you in advanced
>>
>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB