Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Keeping Map-Tasks alive


+
Yaron Gonen 2012-08-05, 10:47
+
Harsh J 2012-08-05, 16:49
Copy link to this message
-
Re: Keeping Map-Tasks alive
Thanks for the fast reply, but I don't see how a custom record reader will
help.
Consider again the k-means: the mappers need to stand-by until all the
reducers finish to calculate the new clusters' center. Only then, after the
reducers finish their work, the stand-by mappers get back to life and
perform their work.

On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Sure you can, as we provide pluggable code points via the API. Just write
> a custom record reader that doubles the work (first round reads actual
> input, second round reads your known output and reiterates). In the mapper,
> separate the first and second logic via a flag.
>
>
> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>> Is there a way to keep a map-task alive after it has finished its work,
>> to later perform another task on its same input?
>> For example, consider the k-means clustering algorithm (k-means
>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).
>> The only thing changing between iterations is the clusters centers. All the
>> input points remain the same. Keeping the mapper alive, and performing the
>> next round of map-tasks on the same node will save a lot of communication
>> cost.
>>
>> Thanks,
>> Yaron
>>
>
>
>
> --
> Harsh J
>
+
Harsh J 2012-08-05, 22:21
+
Yaron Gonen 2012-08-06, 07:23
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB