There are kind of two parts to this. The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”. In order to accomplish that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same key to the same node where they can be processed together. You can think of key-partitioning as a strategy that assists in parallel distributed sorting.
From: Sai Sai [mailto:[EMAIL PROTECTED]]
Sent: Friday, June 07, 2013 5:17 AM
To: [EMAIL PROTECTED]
Subject: Re: Why/When partitioner is used.
I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on...
Is it just to parallelize the reduce process.