The sort of behavior you want is intentionally omitted from MapReduce's
capabilities. Reduce partitions are kept as abstract notions and your
MapReduce program cannot bind partitions to particular physical nodes. This
is for fault-tolerance purposes. If machine1 crashes, then partition1 can
still be rescheduled onto machine3 and the computation can continue.
Sorry that's frustrating for your use case!
On Fri, Mar 5, 2010 at 8:50 PM, Yanfeng Zhang <[EMAIL PROTECTED]> wrote:
> Hi, all
> The KV pairs (kv1, kv2, kv3 kv4) out from mapper would be partitioned into
> parts (e.g. R=2) by a partitioner. For example, kv1 and kv2 are in
> partition1, while kv3 and kv4 are in partition2, the reducers will get KV
> pairs from these two partitions, reducer1 get KV pairs from partition1 and
> reducer2 get KV pairs from partition2.
> I want to let machine1 get KV pairs from partition1 and machine2 get KV
> pairs from partition2. But reducer1 is not always on machine1, reducer2 is
> not always on machine2. Is there any way to make sure kv1 and kv2 are sent
> to machine1 and kv3, kv4 are sent to machine2?
> Thank you in advance!
> Yanfeng Zhang