Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> map stucks at 99.99%


Copy link to this message
-
Re: map stucks at 99.99%
Hi Patai
   I found a similar explanation on the google mapreduce publication.

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/zh-CN//archive/mapreduce-osdi04.pdf

   Please refere to the chapter:3.6 Backup Tasks

Hope to be helpful

regards

2013/3/1 Matt Davies <[EMAIL PROTECTED]>

> I've seen this before if the input data stream changes suddenly and does
> not lend itself to parallelization such as counting the number of tuples in
> a bag.
>
> One think that may be interesting are the job counters from a previous job
> vs this job that just completed.  Do they differ? Is there a particular
> mapper that seems to have counts that are way out of whack?
>
> Has someone tweaked the production job in one way or another?
>
>
>
>
> On Thu, Feb 28, 2013 at 1:28 PM, Patai Sangbutsarakum <
> [EMAIL PROTECTED]> wrote:
>
>> > What type of CPU is on the box ? load average seems pretty high for a
>> 8-core
>> > box.
>> Xeon 3.07GHz, 24 cores
>>
>> > Do you have ganglia on these boxes ? Is the load average always so high?
>> > What's the memory usage for the task and overall on the box ?
>> From top -p pid of the task
>> CPU 143.2%  MEM 1.7%
>> So, it is not mem dried up on her, cpu is pretty pecked.
>>
>> >
>> > How long has the map task been running in that stuck state ?
>> --> at least 2 hours.
>>
>>
>> It finally just finished after hours, it double on time used today.. T_T
>>
>>
>>
>>
>>
>>
>> On Thu, Feb 28, 2013 at 1:18 PM, Viral Bajaria <[EMAIL PROTECTED]>
>> wrote:
>> > What type of CPU is on the box ? load average seems pretty high for a
>> 8-core
>> > box. Do you have ganglia on these boxes ? Is the load average always so
>> high
>> > ? What's the memory usage for the task and overall on the box ?
>> >
>> > How long has the map task been running in that stuck state ? If it's
>> been a
>> > few minutes, I am surprised that the JT didn't try to run it on another
>> node
>> > or have you switched off speculative execution ?
>> >
>> > Sorry too many questions !!
>> >
>> > You can try jstack, jmap. That will atleast tell you about what's
>> getting
>> > blocked.
>> >
>> > On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum
>> > <[EMAIL PROTECTED]> wrote:
>> >>
>> >> - Check the box on which the task is running, is it under heavy load ?
>> >> Is there high amount of I/O wait ?
>> >> CPU, very warm load average: 47.47, 48.56, 49.00
>> >> I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto
>> >> 100tps, on 10 disks jbod.
>> >>
>> >>
>> >> - You could check the task logs and see if they say anything about
>> >> what is going wrong ?
>> >> I would say no.. pretty much all of them is INFO
>> >>
>> >> - Did the task get pre-empted to other task trackers ? If yes, is it
>> >> stuck at the same spot on those ?
>> >> Nope.
>> >>
>> >> - What kind of work are you doing in the mapper ? Just reading from
>> >> HDFS and compute something or reading/writing from HBase ?
>> >> HDFS + compute, R/W
>> >> Absolutely no HBase.
>> >>
>> >> Would jstack, jmap be any useful ?
>> >>
>> >>
>> >> > - You could check the task logs and see if they say anything about
>> what
>> >> > is
>> >> > going wrong ?
>> >> > - Did the task get pre-empted to other task trackers ? If yes, is it
>> >> > stuck
>> >> > at the same spot on those ?
>> >> > - What kind of work are you doing in the mapper ? Just reading from
>> HDFS
>> >> > and
>> >> > compute something or reading/writing from HBase ?
>> >>
>> >> On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria <
>> [EMAIL PROTECTED]>
>> >> wrote:
>> >> > You could start off doing the following:
>> >> >
>> >> > - Check the box on which the task is running, is it under heavy load
>> ?
>> >> > Is
>> >> > there high amount of I/O wait ?
>> >> > - You could check the task logs and see if they say anything about
>> what
>> >> > is
>> >> > going wrong ?
>> >> > - Did the task get pre-empted to other task trackers ? If yes, is it
>> >> > stuck
>> >> > at the same spot on those ?