|
|
-
can't disable speculative execution?
Yang 2012-07-12, 03:46
I set the following params to be false in my pig script (0.10.0)
SET mapred.map.tasks.speculative.execution false; SET mapred.reduce.tasks.speculative.execution false; I also verified in the jobtracker UI in the job.xml that they are indeed set correctly.
when the job finished, jobtracker UI shows that there is only one attempt for each task (in fact I have only 1 task too).
but when I went to the tasktracker node, looked under the /var/log/hadoop/userlogs/job_id_here/ dir , there are 3 attempts dir , job_201207111710_0024 # ls attempt_201207111710_0024_m_000000_0 attempt_201207111710_0024_m_000001_0 attempt_201207111710_0024_m_000002_0 job-acls.xml
so 3 attempts were indeed fired ??
I have to get this controlled correctly because I'm trying to debug the mappers through eclipse, but if more than 1 mapper process is fired, they all try to connect to the same debugger port, and the end result is that nobody is able to hook to the debugger. Thanks Yang
-
Re: can't disable speculative execution?
Harsh J 2012-07-12, 05:05
Your problem is more from the fact that you are running > 1 map slot per TT, and multiple mappers are getting run at the same time, all trying to bind to the same port. Limit your TT's max map tasks to 1 when you're relying on such techniques to debug, or use the LocalJobRunner/Apache MRUnit instead.
On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote: > I set the following params to be false in my pig script (0.10.0) > > SET mapred.map.tasks.speculative.execution false; > SET mapred.reduce.tasks.speculative.execution false; > > > I also verified in the jobtracker UI in the job.xml that they are indeed > set correctly. > > when the job finished, jobtracker UI shows that there is only one attempt > for each task (in fact I have only 1 task too). > > but when I went to the tasktracker node, looked under the > /var/log/hadoop/userlogs/job_id_here/ > dir , there are 3 attempts dir , > job_201207111710_0024 # ls > attempt_201207111710_0024_m_000000_0 attempt_201207111710_0024_m_000001_0 > attempt_201207111710_0024_m_000002_0 job-acls.xml > > so 3 attempts were indeed fired ?? > > I have to get this controlled correctly because I'm trying to debug the > mappers through eclipse, > but if more than 1 mapper process is fired, they all try to connect to the > same debugger port, and the end result is that nobody is able to > hook to the debugger. > > > Thanks > Yang
-- Harsh J
+
Harsh J 2012-07-12, 05:05
-
Re: can't disable speculative execution?
Yang 2012-07-12, 05:07
yes, let me try that
changing the max mapper slot actually requires changing the hadoop config, since I just found that it's "final" param
On Wed, Jul 11, 2012 at 10:05 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Your problem is more from the fact that you are running > 1 map slot > per TT, and multiple mappers are getting run at the same time, all > trying to bind to the same port. Limit your TT's max map tasks to 1 > when you're relying on such techniques to debug, or use the > LocalJobRunner/Apache MRUnit instead. > > On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote: > > I set the following params to be false in my pig script (0.10.0) > > > > SET mapred.map.tasks.speculative.execution false; > > SET mapred.reduce.tasks.speculative.execution false; > > > > > > I also verified in the jobtracker UI in the job.xml that they are indeed > > set correctly. > > > > when the job finished, jobtracker UI shows that there is only one attempt > > for each task (in fact I have only 1 task too). > > > > but when I went to the tasktracker node, looked under the > > /var/log/hadoop/userlogs/job_id_here/ > > dir , there are 3 attempts dir , > > job_201207111710_0024 # ls > > attempt_201207111710_0024_m_000000_0 > attempt_201207111710_0024_m_000001_0 > > attempt_201207111710_0024_m_000002_0 job-acls.xml > > > > so 3 attempts were indeed fired ?? > > > > I have to get this controlled correctly because I'm trying to debug the > > mappers through eclipse, > > but if more than 1 mapper process is fired, they all try to connect to > the > > same debugger port, and the end result is that nobody is able to > > hook to the debugger. > > > > > > Thanks > > Yang > > > > -- > Harsh J >
-
Re: can't disable speculative execution?
Harsh J 2012-07-12, 05:00
Yang,
No, those three are individual task attempts.
This is how you may generally dissect an attempt ID when reading it:
attempt_201207111710_0024_m_000000_0
1. "attempt" - indicates its an attempt ID you'll be reading 2. "201207111710" - The job tracker timestamp ID, indicating which instance of JT ran this job 3. "0024" - The Job ID for which this was a task attempt 4. "m" - Indicating this is a mapper (reducers are "r") 5. "000000" - The task ID of the mapper (00000 is the first mapper, 00001 is the second, etc.) 6. "0" - The attempt # for the task ID. 0 means it is the first attempt, 1 indicates the second attempt, etc.
On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote: > I set the following params to be false in my pig script (0.10.0) > > SET mapred.map.tasks.speculative.execution false; > SET mapred.reduce.tasks.speculative.execution false; > > > I also verified in the jobtracker UI in the job.xml that they are indeed > set correctly. > > when the job finished, jobtracker UI shows that there is only one attempt > for each task (in fact I have only 1 task too). > > but when I went to the tasktracker node, looked under the > /var/log/hadoop/userlogs/job_id_here/ > dir , there are 3 attempts dir , > job_201207111710_0024 # ls > attempt_201207111710_0024_m_000000_0 attempt_201207111710_0024_m_000001_0 > attempt_201207111710_0024_m_000002_0 job-acls.xml > > so 3 attempts were indeed fired ?? > > I have to get this controlled correctly because I'm trying to debug the > mappers through eclipse, > but if more than 1 mapper process is fired, they all try to connect to the > same debugger port, and the end result is that nobody is able to > hook to the debugger. > > > Thanks > Yang
-- Harsh J
+
Harsh J 2012-07-12, 05:00
-
Re: can't disable speculative execution?
Yang 2012-07-12, 05:06
Thanks Harsh
I see
then there seems to be some small problems with the Splitter / InputFormat.
I'm just reading a 1-line text file through pig:
A = LOAD 'myinput.txt' ;
supposedly it should generate at most 1 mapper.
but in reality , it seems that pig generated 3 mappers, and basically fed empty input to 2 of the mappers Thanks Yang
On Wed, Jul 11, 2012 at 10:00 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Yang, > > No, those three are individual task attempts. > > This is how you may generally dissect an attempt ID when reading it: > > attempt_201207111710_0024_m_000000_0 > > 1. "attempt" - indicates its an attempt ID you'll be reading > 2. "201207111710" - The job tracker timestamp ID, indicating which > instance of JT ran this job > 3. "0024" - The Job ID for which this was a task attempt > 4. "m" - Indicating this is a mapper (reducers are "r") > 5. "000000" - The task ID of the mapper (00000 is the first mapper, > 00001 is the second, etc.) > 6. "0" - The attempt # for the task ID. 0 means it is the first > attempt, 1 indicates the second attempt, etc. > > On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote: > > I set the following params to be false in my pig script (0.10.0) > > > > SET mapred.map.tasks.speculative.execution false; > > SET mapred.reduce.tasks.speculative.execution false; > > > > > > I also verified in the jobtracker UI in the job.xml that they are indeed > > set correctly. > > > > when the job finished, jobtracker UI shows that there is only one attempt > > for each task (in fact I have only 1 task too). > > > > but when I went to the tasktracker node, looked under the > > /var/log/hadoop/userlogs/job_id_here/ > > dir , there are 3 attempts dir , > > job_201207111710_0024 # ls > > attempt_201207111710_0024_m_000000_0 > attempt_201207111710_0024_m_000001_0 > > attempt_201207111710_0024_m_000002_0 job-acls.xml > > > > so 3 attempts were indeed fired ?? > > > > I have to get this controlled correctly because I'm trying to debug the > > mappers through eclipse, > > but if more than 1 mapper process is fired, they all try to connect to > the > > same debugger port, and the end result is that nobody is able to > > hook to the debugger. > > > > > > Thanks > > Yang > > > > -- > Harsh J >
-
Re: can't disable speculative execution?
Harsh J 2012-07-12, 05:14
Try passing mapred.map.tasks = 0 or set a higher min-split size?
On Thu, Jul 12, 2012 at 10:36 AM, Yang <[EMAIL PROTECTED]> wrote: > Thanks Harsh > > I see > > then there seems to be some small problems with the Splitter / InputFormat. > > I'm just reading a 1-line text file through pig: > > A = LOAD 'myinput.txt' ; > > supposedly it should generate at most 1 mapper. > > but in reality , it seems that pig generated 3 mappers, and basically fed > empty input to 2 of the mappers > > > Thanks > Yang > > On Wed, Jul 11, 2012 at 10:00 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Yang, >> >> No, those three are individual task attempts. >> >> This is how you may generally dissect an attempt ID when reading it: >> >> attempt_201207111710_0024_m_000000_0 >> >> 1. "attempt" - indicates its an attempt ID you'll be reading >> 2. "201207111710" - The job tracker timestamp ID, indicating which >> instance of JT ran this job >> 3. "0024" - The Job ID for which this was a task attempt >> 4. "m" - Indicating this is a mapper (reducers are "r") >> 5. "000000" - The task ID of the mapper (00000 is the first mapper, >> 00001 is the second, etc.) >> 6. "0" - The attempt # for the task ID. 0 means it is the first >> attempt, 1 indicates the second attempt, etc. >> >> On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote: >> > I set the following params to be false in my pig script (0.10.0) >> > >> > SET mapred.map.tasks.speculative.execution false; >> > SET mapred.reduce.tasks.speculative.execution false; >> > >> > >> > I also verified in the jobtracker UI in the job.xml that they are indeed >> > set correctly. >> > >> > when the job finished, jobtracker UI shows that there is only one attempt >> > for each task (in fact I have only 1 task too). >> > >> > but when I went to the tasktracker node, looked under the >> > /var/log/hadoop/userlogs/job_id_here/ >> > dir , there are 3 attempts dir , >> > job_201207111710_0024 # ls >> > attempt_201207111710_0024_m_000000_0 >> attempt_201207111710_0024_m_000001_0 >> > attempt_201207111710_0024_m_000002_0 job-acls.xml >> > >> > so 3 attempts were indeed fired ?? >> > >> > I have to get this controlled correctly because I'm trying to debug the >> > mappers through eclipse, >> > but if more than 1 mapper process is fired, they all try to connect to >> the >> > same debugger port, and the end result is that nobody is able to >> > hook to the debugger. >> > >> > >> > Thanks >> > Yang >> >> >> >> -- >> Harsh J >>
-- Harsh J
+
Harsh J 2012-07-12, 05:14
-
Re: can't disable speculative execution?
Harsh J 2012-07-12, 05:15
Er, sorry I meant mapred.map.tasks = 1
On Thu, Jul 12, 2012 at 10:44 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Try passing mapred.map.tasks = 0 or set a higher min-split size? > > On Thu, Jul 12, 2012 at 10:36 AM, Yang <[EMAIL PROTECTED]> wrote: >> Thanks Harsh >> >> I see >> >> then there seems to be some small problems with the Splitter / InputFormat. >> >> I'm just reading a 1-line text file through pig: >> >> A = LOAD 'myinput.txt' ; >> >> supposedly it should generate at most 1 mapper. >> >> but in reality , it seems that pig generated 3 mappers, and basically fed >> empty input to 2 of the mappers >> >> >> Thanks >> Yang >> >> On Wed, Jul 11, 2012 at 10:00 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >>> Yang, >>> >>> No, those three are individual task attempts. >>> >>> This is how you may generally dissect an attempt ID when reading it: >>> >>> attempt_201207111710_0024_m_000000_0 >>> >>> 1. "attempt" - indicates its an attempt ID you'll be reading >>> 2. "201207111710" - The job tracker timestamp ID, indicating which >>> instance of JT ran this job >>> 3. "0024" - The Job ID for which this was a task attempt >>> 4. "m" - Indicating this is a mapper (reducers are "r") >>> 5. "000000" - The task ID of the mapper (00000 is the first mapper, >>> 00001 is the second, etc.) >>> 6. "0" - The attempt # for the task ID. 0 means it is the first >>> attempt, 1 indicates the second attempt, etc. >>> >>> On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote: >>> > I set the following params to be false in my pig script (0.10.0) >>> > >>> > SET mapred.map.tasks.speculative.execution false; >>> > SET mapred.reduce.tasks.speculative.execution false; >>> > >>> > >>> > I also verified in the jobtracker UI in the job.xml that they are indeed >>> > set correctly. >>> > >>> > when the job finished, jobtracker UI shows that there is only one attempt >>> > for each task (in fact I have only 1 task too). >>> > >>> > but when I went to the tasktracker node, looked under the >>> > /var/log/hadoop/userlogs/job_id_here/ >>> > dir , there are 3 attempts dir , >>> > job_201207111710_0024 # ls >>> > attempt_201207111710_0024_m_000000_0 >>> attempt_201207111710_0024_m_000001_0 >>> > attempt_201207111710_0024_m_000002_0 job-acls.xml >>> > >>> > so 3 attempts were indeed fired ?? >>> > >>> > I have to get this controlled correctly because I'm trying to debug the >>> > mappers through eclipse, >>> > but if more than 1 mapper process is fired, they all try to connect to >>> the >>> > same debugger port, and the end result is that nobody is able to >>> > hook to the debugger. >>> > >>> > >>> > Thanks >>> > Yang >>> >>> >>> >>> -- >>> Harsh J >>> > > > > -- > Harsh J
-- Harsh J
+
Harsh J 2012-07-12, 05:15
-
Re: can't disable speculative execution?
Yang 2012-07-12, 06:39
Thanks Harsh
I did set mapred.map.tasks = 1
but still I can consistently see 3 mappers being invoked
and the order is always like this:
****_00002_0 ***_00000_0 ***_00001_0
the 00002_0 and 00001_0 tasks are the ones that consume 0 data this does look like a bug ---- you could try with a simple pig test
Yang
On Wed, Jul 11, 2012 at 10:15 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Er, sorry I meant mapred.map.tasks = 1 > > On Thu, Jul 12, 2012 at 10:44 AM, Harsh J <[EMAIL PROTECTED]> wrote: > > Try passing mapred.map.tasks = 0 or set a higher min-split size? > >t > > On Thu, Jul 12, 2012 at 10:36 AM, Yang <[EMAIL PROTECTED]> wrote: > >> Thanks Harsh > >> > >> I see > >> > >> then there seems to be some small problems with the Splitter / > InputFormat. > >> > >> I'm just reading a 1-line text file through pig: > >> > >> A = LOAD 'myinput.txt' ; > >> > >> supposedly it should generate at most 1 mapper. > >> > >> but in reality , it seems that pig generated 3 mappers, and basically > fed > >> empty input to 2 of the mappers > >> > >> > >> Thanks > >> Yang > >> > >> On Wed, Jul 11, 2012 at 10:00 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >>> Yang, > >>> > >>> No, those three are individual task attempts. > >>> > >>> This is how you may generally dissect an attempt ID when reading it: > >>> > >>> attempt_201207111710_0024_m_000000_0 > >>> > >>> 1. "attempt" - indicates its an attempt ID you'll be reading > >>> 2. "201207111710" - The job tracker timestamp ID, indicating which > >>> instance of JT ran this job > >>> 3. "0024" - The Job ID for which this was a task attempt > >>> 4. "m" - Indicating this is a mapper (reducers are "r") > >>> 5. "000000" - The task ID of the mapper (00000 is the first mapper, > >>> 00001 is the second, etc.) > >>> 6. "0" - The attempt # for the task ID. 0 means it is the first > >>> attempt, 1 indicates the second attempt, etc. > >>> > >>> On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote: > >>> > I set the following params to be false in my pig script (0.10.0) > >>> > > >>> > SET mapred.map.tasks.speculative.execution false; > >>> > SET mapred.reduce.tasks.speculative.execution false; > >>> > > >>> > > >>> > I also verified in the jobtracker UI in the job.xml that they are > indeed > >>> > set correctly. > >>> > > >>> > when the job finished, jobtracker UI shows that there is only one > attempt > >>> > for each task (in fact I have only 1 task too). > >>> > > >>> > but when I went to the tasktracker node, looked under the > >>> > /var/log/hadoop/userlogs/job_id_here/ > >>> > dir , there are 3 attempts dir , > >>> > job_201207111710_0024 # ls > >>> > attempt_201207111710_0024_m_000000_0 > >>> attempt_201207111710_0024_m_000001_0 > >>> > attempt_201207111710_0024_m_000002_0 job-acls.xml > >>> > > >>> > so 3 attempts were indeed fired ?? > >>> > > >>> > I have to get this controlled correctly because I'm trying to debug > the > >>> > mappers through eclipse, > >>> > but if more than 1 mapper process is fired, they all try to connect > to > >>> the > >>> > same debugger port, and the end result is that nobody is able to > >>> > hook to the debugger. > >>> > > >>> > > >>> > Thanks > >>> > Yang > >>> > >>> > >>> > >>> -- > >>> Harsh J > >>> > > > > > > > > -- > > Harsh J > > > > -- > Harsh J >
|
|