Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - General Pig store questions


Copy link to this message
-
Re: General Pig store questions
Prashant Kommireddi 2013-03-22, 20:33
Hi Mark,

It depends on the operations. For eg, one might want to aggregate
based on a certain field - in M/R it would be implemented by writing
out a key value pair from the mapper, and implement the aggregation
function in reducer, say Count or Sum based on the key.

To answer your question, you would typically use "group by" a certain
field in the tuple and that would the key on which the reducers
operate. For eg,

A = load 'input' as userid, accnt;
B = group A by user;
C = foreach B generate group, COUNT(A);

In this example the user field is the key. It's equivalent to a
context.write(user, 1) in the map function of plain MR (generally
speaking)

Sent from my iPhone

On Mar 22, 2013, at 12:39 PM, Mark <[EMAIL PROTECTED]> wrote:

> In map/reduce all values for 1 key are guaranteed to go to the same reducer. Is there something analogous to this in Pig? If so, what determines the key when I output a bunch of tuples?