Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Jobtracker page hangs ..again.


Copy link to this message
-
Re: Jobtracker page hangs ..again.
Harsh J 2013-08-13, 03:01
If you're not already doing it, run a local name caching daemon (such
as ncsd, etc.) on each cluster node. Hadoop does a lot of lookups and
a local cache would do good in reducing the load on your DNS.

On Tue, Aug 13, 2013 at 3:09 AM, Patai Sangbutsarakum
<[EMAIL PROTECTED]> wrote:
> Update, after adjust the network routing, dns query speed is in micro sec as
> suppose to be. the issue is completely solve.
> Jobtracker page doesn't hang anymore when launch 100k mappers job..
>
> Cheers,
>
>
>
> On Mon, Aug 12, 2013 at 1:29 PM, Patai Sangbutsarakum
> <[EMAIL PROTECTED]> wrote:
>>
>> Ok, after some sweat, i think I found the root cause but still need
>> another team to help me fix it.
>> It lines on the DNS.  Somehow each of the tip:task line, through the
>> tcpdump, i saw the dns query to dns server. Timestamp seems matches to me.
>>
>> 2013-08-11 20:39:23,493 INFO org.apache.hadoop.mapred.JobInProgress:
>> tip:task_201308111631_0006_m_000000 has split on node:/rack1/host1
>>
>> 127 ms
>>
>> 2013-08-11 20:39:23,620 INFO org.apache.hadoop.mapred.JobInProgress:
>> tip:task_201308111631_0006_m_000000 has split on node:/rack1/host2
>>
>> 126 ms
>>
>> 2013-08-11 20:39:23,746 INFO org.apache.hadoop.mapred.JobInProgress:
>> tip:task_201308111631_0006_m_000000 has split on node:/rack2/host3
>>
>>
>> 20:39:23.367337 IP jtk.53110 > dns1.domain: 41717+ A? host1. (37)
>>
>> 20:39:23.367345 IP jtk.53110 > dns1.domain: 7221+ AAAA? host1. (37)
>>
>> 20:39:23.493486 IP dns1.domain > jtk.53110: 7221* 0/1/0 (89)
>>
>> 20:39:23.493505 IP dns1.domain > : jtk.41717* 1/4/2 A xx.xx.xx.xx (189)
>>
>>
>> 20:39:23.493766 IP jtk.48042 > dns1.domain: 35450+ A? host2. (37)
>>
>> 20:39:23.493774 IP jtk.48042 > dns1.domain: 56007+ AAAA? host2. (37)
>>
>> 20:39:23.619903 IP dns1.domain > jtk.48042: 35450* 1/4/2 A yy.yy.yy.yy
>> (189)
>>
>> 20:39:23.619921 IP dns1.domain > jtk.48042: 56007* 0/1/0 (89)
>>
>>
>> 20:39:23.620208 IP jtk.41237 > dns2.domain: 49511+ A? host3. (37)
>>
>> 20:39:23.620215 IP jtk.41237 > dns2.domain: 29199+ AAAA? host3. (37)
>>
>> 20:39:23.746358 IP dns2.domain > jtk.41237: 49511* 1/4/2 A zz.zz.zz.zz
>> (189)
>>
>> I looked at the jobtracker log in other datacenter when submitted with the
>> same query. Timestamp in each tip:task line is much much faster.
>>
>> The question that raise here is the job initialization is really request
>> the DNS, if so is there any way to suppress that. topology file is already
>> in place where name and ip are already there.
>>
>>
>> Hope this make sense
>>
>> Patai
>>
>>
>>
>>
>> On Fri, Aug 9, 2013 at 6:57 PM, Patai Sangbutsarakum
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Appreciate your input Bryant, i will try to reproduce and see the
>>> namenode log before, while, and after it pause.
>>> Wish me luck
>>>
>>>
>>> On Fri, Aug 9, 2013 at 2:09 PM, Bryan Beaudreault
>>> <[EMAIL PROTECTED]> wrote:
>>>>
>>>> When I've had problems with a slow jobtracker, i've found the issue to
>>>> be one of the following two (so far) possibilities:
>>>>
>>>> - long GC pause (I'm guessing this is not it based on your email)
>>>> - hdfs is slow
>>>>
>>>> I haven't dived into the code yet, but circumstantially I've found that
>>>> when you submit a job the jobtracker needs to put a bunch of files in hdfs,
>>>> such as the job.xml, the job jar, etc.  I'm not sure how this scales with
>>>> larger and larger jobs, aside form the size of the splits serialization in
>>>> the job.xml, but if your HDFS is slow for any reason it can cause pauses in
>>>> your jobtracker.  This affects other jobs being able to submit, as well as
>>>> the 50030 web ui.
>>>>
>>>> I'd take a look at your namenode logs.  When the jobtracker logs pause,
>>>> do you see a corresponding pause in the namenode logs?  What gets spewed
>>>> before and after that pause?
>>>>
>>>>
>>>> On Fri, Aug 9, 2013 at 4:41 PM, Patai Sangbutsarakum
>>>> <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> A while back, i was fighting with the jobtracker page hangs when i

Harsh J