Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Speculative Execution and Streaming


Copy link to this message
-
Re: Speculative Execution and Streaming
Greg,

> Does anybody know whether or not speculative execution works with Hadoop
> streaming?
>
> If so, I have a script that does not appear to ever launch redundant mappers
> for the slow performers. This may be due to the fact that each mapper
> quickly reports (inaccurately) that it is 100% complete. I am using the
> NLineInputFormat and each mapper gets 17 lines of input. Each line requires
> a lot of computation. It appears that all 17 lines immediately get counted
> as being processed early on. Is there anyway to report or force accurate
> completion stats? Could this explain why speculative execution never gets
> triggered?
>

I am wondering if you are hitting
https://issues.apache.org/jira/browse/MAPREDUCE-1073.

In M/R pipes jobs, the map task progress moves to 100% as soon as the
input is read, because the processing happens asynchronously. As
Sreekanth notes, this would result in speculation not working as
expected.

Thanks
Hemanth
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB