-Re: Sorting/Partitioning of Pig output
Jonathan Coveney 2013-03-27, 20:41
as far as when the storefunc works, it depends on whether the job is map
only or map/reduce. It'll work on the last phase. Generally this is the
As far as how pig knows where to send it's output, there are keys in pig.
Basically, a reduce job is necessary any time you have a group, join, or
sort. In the case of a group or join, the key is the group key and the join
key, respectively. In the case of a sort it is more complicated.
2013/3/27 Mark <[EMAIL PROTECTED]>
> I understand in the traditional map/reduce paradigm that each key will get
> sent to the same reducer sorted but in pig there is no such thing as a
> "key". I'm curious to know how pig knows to which reducer to send its
> output to?
> So when creating a custom StoreFunc is there any guarentee on the ordering
> of Tuples that come into putNext?
> And another even more basic question. Do StoreFuncs operate at the Map
> phase or Reduce phase?