Thanks for the fast reply, but I don't see how a custom record reader will
Consider again the k-means: the mappers need to stand-by until all the
reducers finish to calculate the new clusters' center. Only then, after the
reducers finish their work, the stand-by mappers get back to life and
perform their work.
On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Sure you can, as we provide pluggable code points via the API. Just write
> a custom record reader that doubles the work (first round reads actual
> input, second round reads your known output and reiterates). In the mapper,
> separate the first and second logic via a flag.
> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <[EMAIL PROTECTED]> wrote:
>> Is there a way to keep a map-task alive after it has finished its work,
>> to later perform another task on its same input?
>> For example, consider the k-means clustering algorithm (k-means
>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>> The only thing changing between iterations is the clusters centers. All the
>> input points remain the same. Keeping the mapper alive, and performing the
>> next round of map-tasks on the same node will save a lot of communication
> Harsh J