Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> can't disable speculative execution?


Copy link to this message
-
Re: can't disable speculative execution?
Thanks Harsh

I did set mapred.map.tasks = 1

but still I can consistently see 3 mappers being invoked

and the order is always like this:

****_00002_0
***_00000_0
***_00001_0

the 00002_0 and 00001_0 tasks are the ones that consume 0 data
this does look like a bug
---- you could try with a simple pig test

Yang

On Wed, Jul 11, 2012 at 10:15 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Er, sorry I meant mapred.map.tasks = 1
>
> On Thu, Jul 12, 2012 at 10:44 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> > Try passing mapred.map.tasks = 0 or set a higher min-split size?
> >t
> > On Thu, Jul 12, 2012 at 10:36 AM, Yang <[EMAIL PROTECTED]> wrote:
> >> Thanks Harsh
> >>
> >> I see
> >>
> >> then there seems to be some small problems with the Splitter /
> InputFormat.
> >>
> >> I'm just reading a 1-line text file through pig:
> >>
> >> A = LOAD 'myinput.txt' ;
> >>
> >> supposedly it should generate at most 1 mapper.
> >>
> >> but in reality , it seems that pig generated 3 mappers, and basically
> fed
> >> empty input to 2 of the mappers
> >>
> >>
> >> Thanks
> >> Yang
> >>
> >> On Wed, Jul 11, 2012 at 10:00 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >>> Yang,
> >>>
> >>> No, those three are individual task attempts.
> >>>
> >>> This is how you may generally dissect an attempt ID when reading it:
> >>>
> >>> attempt_201207111710_0024_m_000000_0
> >>>
> >>> 1. "attempt" - indicates its an attempt ID you'll be reading
> >>> 2. "201207111710" - The job tracker timestamp ID, indicating which
> >>> instance of JT ran this job
> >>> 3. "0024" - The Job ID for which this was a task attempt
> >>> 4. "m" - Indicating this is a mapper (reducers are "r")
> >>> 5. "000000" - The task ID of the mapper (00000 is the first mapper,
> >>> 00001 is the second, etc.)
> >>> 6. "0" - The attempt # for the task ID. 0 means it is the first
> >>> attempt, 1 indicates the second attempt, etc.
> >>>
> >>> On Thu, Jul 12, 2012 at 9:16 AM, Yang <[EMAIL PROTECTED]> wrote:
> >>> > I set the following params to be false in my pig script (0.10.0)
> >>> >
> >>> > SET mapred.map.tasks.speculative.execution false;
> >>> > SET mapred.reduce.tasks.speculative.execution false;
> >>> >
> >>> >
> >>> > I also verified in the jobtracker UI in the job.xml that they are
> indeed
> >>> > set correctly.
> >>> >
> >>> > when the job finished, jobtracker UI shows that there is only one
> attempt
> >>> > for each task (in fact I have only 1 task too).
> >>> >
> >>> > but when I went to the tasktracker node, looked under the
> >>> > /var/log/hadoop/userlogs/job_id_here/
> >>> > dir , there are 3 attempts dir ,
> >>> >  job_201207111710_0024 # ls
> >>> > attempt_201207111710_0024_m_000000_0
> >>>  attempt_201207111710_0024_m_000001_0
> >>> >  attempt_201207111710_0024_m_000002_0  job-acls.xml
> >>> >
> >>> > so 3 attempts were indeed fired ??
> >>> >
> >>> > I have to get this controlled correctly because I'm trying to debug
> the
> >>> > mappers through eclipse,
> >>> > but if more than 1 mapper process is fired, they all try to connect
> to
> >>> the
> >>> > same debugger port, and the end result is that nobody is able to
> >>> > hook to the debugger.
> >>> >
> >>> >
> >>> > Thanks
> >>> > Yang
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>>
> >
> >
> >
> > --
> > Harsh J
>
>
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB