|
|
-
Re: map stucks at 99.99%YouPeng Yang 2013-03-02, 02:36
Hi Patai
I found a similar explanation on the google mapreduce publication. http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/zh-CN//archive/mapreduce-osdi04.pdf Please refere to the chapter:3.6 Backup Tasks Hope to be helpful regards 2013/3/1 Matt Davies <[EMAIL PROTECTED]> > I've seen this before if the input data stream changes suddenly and does > not lend itself to parallelization such as counting the number of tuples in > a bag. > > One think that may be interesting are the job counters from a previous job > vs this job that just completed. Do they differ? Is there a particular > mapper that seems to have counts that are way out of whack? > > Has someone tweaked the production job in one way or another? > > > > > On Thu, Feb 28, 2013 at 1:28 PM, Patai Sangbutsarakum < > [EMAIL PROTECTED]> wrote: > >> > What type of CPU is on the box ? load average seems pretty high for a >> 8-core >> > box. >> Xeon 3.07GHz, 24 cores >> >> > Do you have ganglia on these boxes ? Is the load average always so high? >> > What's the memory usage for the task and overall on the box ? >> From top -p pid of the task >> CPU 143.2% MEM 1.7% >> So, it is not mem dried up on her, cpu is pretty pecked. >> >> > >> > How long has the map task been running in that stuck state ? >> --> at least 2 hours. >> >> >> It finally just finished after hours, it double on time used today.. T_T >> >> >> >> >> >> >> On Thu, Feb 28, 2013 at 1:18 PM, Viral Bajaria <[EMAIL PROTECTED]> >> wrote: >> > What type of CPU is on the box ? load average seems pretty high for a >> 8-core >> > box. Do you have ganglia on these boxes ? Is the load average always so >> high >> > ? What's the memory usage for the task and overall on the box ? >> > >> > How long has the map task been running in that stuck state ? If it's >> been a >> > few minutes, I am surprised that the JT didn't try to run it on another >> node >> > or have you switched off speculative execution ? >> > >> > Sorry too many questions !! >> > >> > You can try jstack, jmap. That will atleast tell you about what's >> getting >> > blocked. >> > >> > On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum >> > <[EMAIL PROTECTED]> wrote: >> >> >> >> - Check the box on which the task is running, is it under heavy load ? >> >> Is there high amount of I/O wait ? >> >> CPU, very warm load average: 47.47, 48.56, 49.00 >> >> I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto >> >> 100tps, on 10 disks jbod. >> >> >> >> >> >> - You could check the task logs and see if they say anything about >> >> what is going wrong ? >> >> I would say no.. pretty much all of them is INFO >> >> >> >> - Did the task get pre-empted to other task trackers ? If yes, is it >> >> stuck at the same spot on those ? >> >> Nope. >> >> >> >> - What kind of work are you doing in the mapper ? Just reading from >> >> HDFS and compute something or reading/writing from HBase ? >> >> HDFS + compute, R/W >> >> Absolutely no HBase. >> >> >> >> Would jstack, jmap be any useful ? >> >> >> >> >> >> > - You could check the task logs and see if they say anything about >> what >> >> > is >> >> > going wrong ? >> >> > - Did the task get pre-empted to other task trackers ? If yes, is it >> >> > stuck >> >> > at the same spot on those ? >> >> > - What kind of work are you doing in the mapper ? Just reading from >> HDFS >> >> > and >> >> > compute something or reading/writing from HBase ? >> >> >> >> On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria < >> [EMAIL PROTECTED]> >> >> wrote: >> >> > You could start off doing the following: >> >> > >> >> > - Check the box on which the task is running, is it under heavy load >> ? >> >> > Is >> >> > there high amount of I/O wait ? >> >> > - You could check the task logs and see if they say anything about >> what >> >> > is >> >> > going wrong ? >> >> > - Did the task get pre-empted to other task trackers ? If yes, is it >> >> > stuck >> >> > at the same spot on those ? |