Yaron Gonen 2012-08-05, 10:47
Harsh J 2012-08-05, 16:49
Yaron Gonen 2012-08-05, 18:41
Harsh J 2012-08-05, 22:21
-Re: Keeping Map-Tasks alive
Yaron Gonen 2012-08-06, 07:23
As I see it, it cannot be done in the MapReduce 1 framework without
changing TaskTracker and JobTracker.
Problem is I'm not familiar at all with YARN... it might be possible there.
On Mon, Aug 6, 2012 at 1:21 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Ah, my bad - I skipped over the K-Means part of your original post.
> There currently isn't a way to do this with the existing MR framework and
> APIs. A Reducer is initiated upon map completion and the Task JVM is canned
> away after the Maps end. Perhaps you can use YARN to write something of
> what you desire?
> On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>> Thanks for the fast reply, but I don't see how a custom record reader
>> will help.
>> Consider again the k-means: the mappers need to stand-by until all the
>> reducers finish to calculate the new clusters' center. Only then, after the
>> reducers finish their work, the stand-by mappers get back to life and
>> perform their work.
>> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> Sure you can, as we provide pluggable code points via the API. Just
>>> write a custom record reader that doubles the work (first round reads
>>> actual input, second round reads your known output and reiterates). In the
>>> mapper, separate the first and second logic via a flag.
>>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]>wrote:
>>>> Is there a way to keep a map-task alive after it has finished its work,
>>>> to later perform another task on its same input?
>>>> For example, consider the k-means clustering algorithm (k-means
>>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>>>> The only thing changing between iterations is the clusters centers. All the
>>>> input points remain the same. Keeping the mapper alive, and performing the
>>>> next round of map-tasks on the same node will save a lot of communication
>>> Harsh J
> Harsh J