Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - can't disable speculative execution?


Copy link to this message
-
Re: can't disable speculative execution?
Yang 2012-07-12, 06:39
Thanks Harsh

I did set mapred.map.tasks = 1

but still I can consistently see 3 mappers being invoked

and the order is always like this:

****_00002_0
***_00000_0
***_00001_0

the 00002_0 and 00001_0 tasks are the ones that consume 0 data
this does look like a bug
---- you could try with a simple pig test

Yang

On Wed, Jul 11, 2012 at 10:15 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Er, sorry I meant mapred.map.tasks = 1
>
> On Thu, Jul 12, 2012 at 10:44 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> > Try passing mapred.map.tasks = 0 or set a higher min-split size?
> >t
> > On Thu, Jul 12, 2012 at 10:36 AM, Yang <[EMAIL PROTECTED]> wrote:
> >> Thanks Harsh
> >>
> >> I see
> >>
> >> then there seems to be some small problems with the Splitter /
> InputFormat.
> >>
> >> I'm just reading a 1-line text file through pig:
> >>
> >> A = LOAD 'myinput.txt' ;
> >>
> >> supposedly it should generate at most 1 mapper.
> >>
> >> but in reality , it seems that pig generated 3 mappers, and basically
> fed
> >> empty input to 2 of the mappers
> >>
> >>
> >> Thanks
> >> Yang
> >>
> >> On Wed, Jul 11, 2012 at 10:00 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >>> Yang,
> >>>
> >>> No, those three are individual task attempts.
> >>>
> >>> This is how you may generally dissect an attempt ID when reading it:
> >>>
> >>> attempt_201207111710_0024_m_000000_0
> >>>
> >>> 1. "attempt" - indicates its an attempt ID you'll be reading
> >>> 2. "201207111710" - The job tracker timestamp ID, indicating which
> >>> instance of JT ran this job
> >>> 3. "0024" - The Job ID for which this was a task attempt
> >>> 4. "m" - Indicating this is a mapper (reducers are "r")
> >>> 5. "000000" - The task ID of the mapper (00000 is the first mapper,
> >>> 00001 is the second, etc.)
> >>> 6. "0" - The attempt # for the task ID. 0 means it is the first
> >>> attempt, 1 indicates the second attempt, etc.
> >>>
> >>> On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote:
> >>> > I set the following params to be false in my pig script (0.10.0)
> >>> >
> >>> > SET mapred.map.tasks.speculative.execution false;
> >>> > SET mapred.reduce.tasks.speculative.execution false;
> >>> >
> >>> >
> >>> > I also verified in the jobtracker UI in the job.xml that they are
> indeed
> >>> > set correctly.
> >>> >
> >>> > when the job finished, jobtracker UI shows that there is only one
> attempt
> >>> > for each task (in fact I have only 1 task too).
> >>> >
> >>> > but when I went to the tasktracker node, looked under the
> >>> > /var/log/hadoop/userlogs/job_id_here/
> >>> > dir , there are 3 attempts dir ,
> >>> >  job_201207111710_0024 # ls
> >>> > attempt_201207111710_0024_m_000000_0
> >>>  attempt_201207111710_0024_m_000001_0
> >>> >  attempt_201207111710_0024_m_000002_0  job-acls.xml
> >>> >
> >>> > so 3 attempts were indeed fired ??
> >>> >
> >>> > I have to get this controlled correctly because I'm trying to debug
> the
> >>> > mappers through eclipse,
> >>> > but if more than 1 mapper process is fired, they all try to connect
> to
> >>> the
> >>> > same debugger port, and the end result is that nobody is able to
> >>> > hook to the debugger.
> >>> >
> >>> >
> >>> > Thanks
> >>> > Yang
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>>
> >
> >
> >
> > --
> > Harsh J
>
>
>
> --
> Harsh J
>