Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Keeping Map-Tasks alive


Copy link to this message
-
Re: Keeping Map-Tasks alive
Yaron Gonen 2012-08-06, 07:23
Thanks.
As I see it, it cannot be done in the MapReduce 1 framework without
changing TaskTracker and JobTracker.
Problem is I'm not familiar at all with YARN... it might be possible there.
Thanks again!

On Mon, Aug 6, 2012 at 1:21 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Ah, my bad - I skipped over the K-Means part of your original post.
>
> There currently isn't a way to do this with the existing MR framework and
> APIs. A Reducer is initiated upon map completion and the Task JVM is canned
> away after the Maps end. Perhaps you can use YARN to write something of
> what you desire?
>
>
> On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>
>> Thanks for the fast reply, but I don't see how a custom record reader
>> will help.
>> Consider again the k-means: the mappers need to stand-by until all the
>> reducers finish to calculate the new clusters' center. Only then, after the
>> reducers finish their work, the stand-by mappers get back to life and
>> perform their work.
>>
>>
>> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Sure you can, as we provide pluggable code points via the API. Just
>>> write a custom record reader that doubles the work (first round reads
>>> actual input, second round reads your known output and reiterates). In the
>>> mapper, separate the first and second logic via a flag.
>>>
>>>
>>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>> Is there a way to keep a map-task alive after it has finished its work,
>>>> to later perform another task on its same input?
>>>> For example, consider the k-means clustering algorithm (k-means
>>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>>>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).
>>>> The only thing changing between iterations is the clusters centers. All the
>>>> input points remain the same. Keeping the mapper alive, and performing the
>>>> next round of map-tasks on the same node will save a lot of communication
>>>> cost.
>>>>
>>>> Thanks,
>>>> Yaron
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
>
> --
> Harsh J
>