It depends on the operations. For eg, one might want to aggregate
based on a certain field - in M/R it would be implemented by writing
out a key value pair from the mapper, and implement the aggregation
function in reducer, say Count or Sum based on the key.
To answer your question, you would typically use "group by" a certain
field in the tuple and that would the key on which the reducers
operate. For eg,
A = load 'input' as userid, accnt;
B = group A by user;
C = foreach B generate group, COUNT(A);
In this example the user field is the key. It's equivalent to a
context.write(user, 1) in the map function of plain MR (generally
Sent from my iPhone
On Mar 22, 2013, at 12:39 PM, Mark <[EMAIL PROTECTED]> wrote:
> In map/reduce all values for 1 key are guaranteed to go to the same reducer. Is there something analogous to this in Pig? If so, what determines the key when I output a bunch of tuples?