Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Keeping Map-Tasks alive


Copy link to this message
-
Re: Keeping Map-Tasks alive
Thanks.
As I see it, it cannot be done in the MapReduce 1 framework without
changing TaskTracker and JobTracker.
Problem is I'm not familiar at all with YARN... it might be possible there.
Thanks again!

On Mon, Aug 6, 2012 at 1:21 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Ah, my bad - I skipped over the K-Means part of your original post.
>
> There currently isn't a way to do this with the existing MR framework and
> APIs. A Reducer is initiated upon map completion and the Task JVM is canned
> away after the Maps end. Perhaps you can use YARN to write something of
> what you desire?
>
>
> On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>
>> Thanks for the fast reply, but I don't see how a custom record reader
>> will help.
>> Consider again the k-means: the mappers need to stand-by until all the
>> reducers finish to calculate the new clusters' center. Only then, after the
>> reducers finish their work, the stand-by mappers get back to life and
>> perform their work.
>>
>>
>> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Sure you can, as we provide pluggable code points via the API. Just
>>> write a custom record reader that doubles the work (first round reads
>>> actual input, second round reads your known output and reiterates). In the
>>> mapper, separate the first and second logic via a flag.
>>>
>>>
>>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>> Is there a way to keep a map-task alive after it has finished its work,
>>>> to later perform another task on its same input?
>>>> For example, consider the k-means clustering algorithm (k-means
>>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>>>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).
>>>> The only thing changing between iterations is the clusters centers. All the
>>>> input points remain the same. Keeping the mapper alive, and performing the
>>>> next round of map-tasks on the same node will save a lot of communication
>>>> cost.
>>>>
>>>> Thanks,
>>>> Yaron
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB