|
Patai Sangbutsarakum
2013-02-28, 20:06
Patai Sangbutsarakum
2013-02-28, 21:04
Viral Bajaria
2013-02-28, 21:18
Patai Sangbutsarakum
2013-02-28, 21:28
Matt Davies
2013-02-28, 22:10
YouPeng Yang
2013-03-02, 02:36
|
-
map stucks at 99.99%Patai Sangbutsarakum 2013-02-28, 20:06
Hadoopers!!
Need input from you guys, i am looking at a critical job in production. it stucks at 99.99% in map phrase for much longer than it used to be.. what to do to debug what is going on with those map why it is not pass through even though tasks and task attempts saying 100% progress but there is not finish time... Please suggest Patai
-
Re: map stucks at 99.99%Patai Sangbutsarakum 2013-02-28, 21:04
- Check the box on which the task is running, is it under heavy load ?
Is there high amount of I/O wait ? CPU, very warm load average: 47.47, 48.56, 49.00 I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto 100tps, on 10 disks jbod. - You could check the task logs and see if they say anything about what is going wrong ? I would say no.. pretty much all of them is INFO - Did the task get pre-empted to other task trackers ? If yes, is it stuck at the same spot on those ? Nope. - What kind of work are you doing in the mapper ? Just reading from HDFS and compute something or reading/writing from HBase ? HDFS + compute, R/W Absolutely no HBase. Would jstack, jmap be any useful ? > - You could check the task logs and see if they say anything about what is > going wrong ? > - Did the task get pre-empted to other task trackers ? If yes, is it stuck > at the same spot on those ? > - What kind of work are you doing in the mapper ? Just reading from HDFS and > compute something or reading/writing from HBase ? On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria <[EMAIL PROTECTED]> wrote: > You could start off doing the following: > > - Check the box on which the task is running, is it under heavy load ? Is > there high amount of I/O wait ? > - You could check the task logs and see if they say anything about what is > going wrong ? > - Did the task get pre-empted to other task trackers ? If yes, is it stuck > at the same spot on those ? > - What kind of work are you doing in the mapper ? Just reading from HDFS and > compute something or reading/writing from HBase ? > > Thanks, > Viral > > On Thu, Feb 28, 2013 at 12:06 PM, Patai Sangbutsarakum > <[EMAIL PROTECTED]> wrote: >> >> Hadoopers!! >> >> Need input from you guys, >> i am looking at a critical job in production. it stucks at 99.99% in >> map phrase for much longer than it used to be.. >> >> what to do to debug what is going on with those map why it is not pass >> through >> even though tasks and task attempts saying 100% progress but there is >> not finish time... >> >> Please suggest >> Patai > >
-
Re: map stucks at 99.99%Viral Bajaria 2013-02-28, 21:18
What type of CPU is on the box ? load average seems pretty high for a
8-core box. Do you have ganglia on these boxes ? Is the load average always so high ? What's the memory usage for the task and overall on the box ? How long has the map task been running in that stuck state ? If it's been a few minutes, I am surprised that the JT didn't try to run it on another node or have you switched off speculative execution ? Sorry too many questions !! You can try jstack, jmap. That will atleast tell you about what's getting blocked. On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum < [EMAIL PROTECTED]> wrote: > - Check the box on which the task is running, is it under heavy load ? > Is there high amount of I/O wait ? > CPU, very warm load average: 47.47, 48.56, 49.00 > I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto > 100tps, on 10 disks jbod. > > > - You could check the task logs and see if they say anything about > what is going wrong ? > I would say no.. pretty much all of them is INFO > > - Did the task get pre-empted to other task trackers ? If yes, is it > stuck at the same spot on those ? > Nope. > > - What kind of work are you doing in the mapper ? Just reading from > HDFS and compute something or reading/writing from HBase ? > HDFS + compute, R/W > Absolutely no HBase. > > Would jstack, jmap be any useful ? > > > > - You could check the task logs and see if they say anything about what > is > > going wrong ? > > - Did the task get pre-empted to other task trackers ? If yes, is it > stuck > > at the same spot on those ? > > - What kind of work are you doing in the mapper ? Just reading from HDFS > and > > compute something or reading/writing from HBase ? > > On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria <[EMAIL PROTECTED]> > wrote: > > You could start off doing the following: > > > > - Check the box on which the task is running, is it under heavy load ? Is > > there high amount of I/O wait ? > > - You could check the task logs and see if they say anything about what > is > > going wrong ? > > - Did the task get pre-empted to other task trackers ? If yes, is it > stuck > > at the same spot on those ? > > - What kind of work are you doing in the mapper ? Just reading from HDFS > and > > compute something or reading/writing from HBase ? > > > > Thanks, > > Viral > > > > On Thu, Feb 28, 2013 at 12:06 PM, Patai Sangbutsarakum > > <[EMAIL PROTECTED]> wrote: > >> > >> Hadoopers!! > >> > >> Need input from you guys, > >> i am looking at a critical job in production. it stucks at 99.99% in > >> map phrase for much longer than it used to be.. > >> > >> what to do to debug what is going on with those map why it is not pass > >> through > >> even though tasks and task attempts saying 100% progress but there is > >> not finish time... > >> > >> Please suggest > >> Patai > > > > >
-
Re: map stucks at 99.99%Patai Sangbutsarakum 2013-02-28, 21:28
> What type of CPU is on the box ? load average seems pretty high for a 8-core
> box. Xeon 3.07GHz, 24 cores > Do you have ganglia on these boxes ? Is the load average always so high? > What's the memory usage for the task and overall on the box ? >From top -p pid of the task CPU 143.2% MEM 1.7% So, it is not mem dried up on her, cpu is pretty pecked. > > How long has the map task been running in that stuck state ? --> at least 2 hours. It finally just finished after hours, it double on time used today.. T_T On Thu, Feb 28, 2013 at 1:18 PM, Viral Bajaria <[EMAIL PROTECTED]> wrote: > What type of CPU is on the box ? load average seems pretty high for a 8-core > box. Do you have ganglia on these boxes ? Is the load average always so high > ? What's the memory usage for the task and overall on the box ? > > How long has the map task been running in that stuck state ? If it's been a > few minutes, I am surprised that the JT didn't try to run it on another node > or have you switched off speculative execution ? > > Sorry too many questions !! > > You can try jstack, jmap. That will atleast tell you about what's getting > blocked. > > On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum > <[EMAIL PROTECTED]> wrote: >> >> - Check the box on which the task is running, is it under heavy load ? >> Is there high amount of I/O wait ? >> CPU, very warm load average: 47.47, 48.56, 49.00 >> I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto >> 100tps, on 10 disks jbod. >> >> >> - You could check the task logs and see if they say anything about >> what is going wrong ? >> I would say no.. pretty much all of them is INFO >> >> - Did the task get pre-empted to other task trackers ? If yes, is it >> stuck at the same spot on those ? >> Nope. >> >> - What kind of work are you doing in the mapper ? Just reading from >> HDFS and compute something or reading/writing from HBase ? >> HDFS + compute, R/W >> Absolutely no HBase. >> >> Would jstack, jmap be any useful ? >> >> >> > - You could check the task logs and see if they say anything about what >> > is >> > going wrong ? >> > - Did the task get pre-empted to other task trackers ? If yes, is it >> > stuck >> > at the same spot on those ? >> > - What kind of work are you doing in the mapper ? Just reading from HDFS >> > and >> > compute something or reading/writing from HBase ? >> >> On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria <[EMAIL PROTECTED]> >> wrote: >> > You could start off doing the following: >> > >> > - Check the box on which the task is running, is it under heavy load ? >> > Is >> > there high amount of I/O wait ? >> > - You could check the task logs and see if they say anything about what >> > is >> > going wrong ? >> > - Did the task get pre-empted to other task trackers ? If yes, is it >> > stuck >> > at the same spot on those ? >> > - What kind of work are you doing in the mapper ? Just reading from HDFS >> > and >> > compute something or reading/writing from HBase ? >> > >> > Thanks, >> > Viral >> > >> > On Thu, Feb 28, 2013 at 12:06 PM, Patai Sangbutsarakum >> > <[EMAIL PROTECTED]> wrote: >> >> >> >> Hadoopers!! >> >> >> >> Need input from you guys, >> >> i am looking at a critical job in production. it stucks at 99.99% in >> >> map phrase for much longer than it used to be.. >> >> >> >> what to do to debug what is going on with those map why it is not pass >> >> through >> >> even though tasks and task attempts saying 100% progress but there is >> >> not finish time... >> >> >> >> Please suggest >> >> Patai >> > >> > > >
-
Re: map stucks at 99.99%Matt Davies 2013-02-28, 22:10
I've seen this before if the input data stream changes suddenly and does
not lend itself to parallelization such as counting the number of tuples in a bag. One think that may be interesting are the job counters from a previous job vs this job that just completed. Do they differ? Is there a particular mapper that seems to have counts that are way out of whack? Has someone tweaked the production job in one way or another? On Thu, Feb 28, 2013 at 1:28 PM, Patai Sangbutsarakum < [EMAIL PROTECTED]> wrote: > > What type of CPU is on the box ? load average seems pretty high for a > 8-core > > box. > Xeon 3.07GHz, 24 cores > > > Do you have ganglia on these boxes ? Is the load average always so high? > > What's the memory usage for the task and overall on the box ? > From top -p pid of the task > CPU 143.2% MEM 1.7% > So, it is not mem dried up on her, cpu is pretty pecked. > > > > > How long has the map task been running in that stuck state ? > --> at least 2 hours. > > > It finally just finished after hours, it double on time used today.. T_T > > > > > > > On Thu, Feb 28, 2013 at 1:18 PM, Viral Bajaria <[EMAIL PROTECTED]> > wrote: > > What type of CPU is on the box ? load average seems pretty high for a > 8-core > > box. Do you have ganglia on these boxes ? Is the load average always so > high > > ? What's the memory usage for the task and overall on the box ? > > > > How long has the map task been running in that stuck state ? If it's > been a > > few minutes, I am surprised that the JT didn't try to run it on another > node > > or have you switched off speculative execution ? > > > > Sorry too many questions !! > > > > You can try jstack, jmap. That will atleast tell you about what's getting > > blocked. > > > > On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum > > <[EMAIL PROTECTED]> wrote: > >> > >> - Check the box on which the task is running, is it under heavy load ? > >> Is there high amount of I/O wait ? > >> CPU, very warm load average: 47.47, 48.56, 49.00 > >> I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto > >> 100tps, on 10 disks jbod. > >> > >> > >> - You could check the task logs and see if they say anything about > >> what is going wrong ? > >> I would say no.. pretty much all of them is INFO > >> > >> - Did the task get pre-empted to other task trackers ? If yes, is it > >> stuck at the same spot on those ? > >> Nope. > >> > >> - What kind of work are you doing in the mapper ? Just reading from > >> HDFS and compute something or reading/writing from HBase ? > >> HDFS + compute, R/W > >> Absolutely no HBase. > >> > >> Would jstack, jmap be any useful ? > >> > >> > >> > - You could check the task logs and see if they say anything about > what > >> > is > >> > going wrong ? > >> > - Did the task get pre-empted to other task trackers ? If yes, is it > >> > stuck > >> > at the same spot on those ? > >> > - What kind of work are you doing in the mapper ? Just reading from > HDFS > >> > and > >> > compute something or reading/writing from HBase ? > >> > >> On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria < > [EMAIL PROTECTED]> > >> wrote: > >> > You could start off doing the following: > >> > > >> > - Check the box on which the task is running, is it under heavy load ? > >> > Is > >> > there high amount of I/O wait ? > >> > - You could check the task logs and see if they say anything about > what > >> > is > >> > going wrong ? > >> > - Did the task get pre-empted to other task trackers ? If yes, is it > >> > stuck > >> > at the same spot on those ? > >> > - What kind of work are you doing in the mapper ? Just reading from > HDFS > >> > and > >> > compute something or reading/writing from HBase ? > >> > > >> > Thanks, > >> > Viral > >> > > >> > On Thu, Feb 28, 2013 at 12:06 PM, Patai Sangbutsarakum > >> > <[EMAIL PROTECTED]> wrote: > >> >> > >> >> Hadoopers!! > >> >> > >> >> Need input from you guys, > >> >> i am looking at a critical job in production. it stucks at 99.99% in
-
Re: map stucks at 99.99%YouPeng Yang 2013-03-02, 02:36
Hi Patai
I found a similar explanation on the google mapreduce publication. http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/zh-CN//archive/mapreduce-osdi04.pdf Please refere to the chapter:3.6 Backup Tasks Hope to be helpful regards 2013/3/1 Matt Davies <[EMAIL PROTECTED]> > I've seen this before if the input data stream changes suddenly and does > not lend itself to parallelization such as counting the number of tuples in > a bag. > > One think that may be interesting are the job counters from a previous job > vs this job that just completed. Do they differ? Is there a particular > mapper that seems to have counts that are way out of whack? > > Has someone tweaked the production job in one way or another? > > > > > On Thu, Feb 28, 2013 at 1:28 PM, Patai Sangbutsarakum < > [EMAIL PROTECTED]> wrote: > >> > What type of CPU is on the box ? load average seems pretty high for a >> 8-core >> > box. >> Xeon 3.07GHz, 24 cores >> >> > Do you have ganglia on these boxes ? Is the load average always so high? >> > What's the memory usage for the task and overall on the box ? >> From top -p pid of the task >> CPU 143.2% MEM 1.7% >> So, it is not mem dried up on her, cpu is pretty pecked. >> >> > >> > How long has the map task been running in that stuck state ? >> --> at least 2 hours. >> >> >> It finally just finished after hours, it double on time used today.. T_T >> >> >> >> >> >> >> On Thu, Feb 28, 2013 at 1:18 PM, Viral Bajaria <[EMAIL PROTECTED]> >> wrote: >> > What type of CPU is on the box ? load average seems pretty high for a >> 8-core >> > box. Do you have ganglia on these boxes ? Is the load average always so >> high >> > ? What's the memory usage for the task and overall on the box ? >> > >> > How long has the map task been running in that stuck state ? If it's >> been a >> > few minutes, I am surprised that the JT didn't try to run it on another >> node >> > or have you switched off speculative execution ? >> > >> > Sorry too many questions !! >> > >> > You can try jstack, jmap. That will atleast tell you about what's >> getting >> > blocked. >> > >> > On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum >> > <[EMAIL PROTECTED]> wrote: >> >> >> >> - Check the box on which the task is running, is it under heavy load ? >> >> Is there high amount of I/O wait ? >> >> CPU, very warm load average: 47.47, 48.56, 49.00 >> >> I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto >> >> 100tps, on 10 disks jbod. >> >> >> >> >> >> - You could check the task logs and see if they say anything about >> >> what is going wrong ? >> >> I would say no.. pretty much all of them is INFO >> >> >> >> - Did the task get pre-empted to other task trackers ? If yes, is it >> >> stuck at the same spot on those ? >> >> Nope. >> >> >> >> - What kind of work are you doing in the mapper ? Just reading from >> >> HDFS and compute something or reading/writing from HBase ? >> >> HDFS + compute, R/W >> >> Absolutely no HBase. >> >> >> >> Would jstack, jmap be any useful ? >> >> >> >> >> >> > - You could check the task logs and see if they say anything about >> what >> >> > is >> >> > going wrong ? >> >> > - Did the task get pre-empted to other task trackers ? If yes, is it >> >> > stuck >> >> > at the same spot on those ? >> >> > - What kind of work are you doing in the mapper ? Just reading from >> HDFS >> >> > and >> >> > compute something or reading/writing from HBase ? >> >> >> >> On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria < >> [EMAIL PROTECTED]> >> >> wrote: >> >> > You could start off doing the following: >> >> > >> >> > - Check the box on which the task is running, is it under heavy load >> ? >> >> > Is >> >> > there high amount of I/O wait ? >> >> > - You could check the task logs and see if they say anything about >> what >> >> > is >> >> > going wrong ? >> >> > - Did the task get pre-empted to other task trackers ? If yes, is it >> >> > stuck >> >> > at the same spot on those ? |