Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Sorting/Partitioning of Pig output

Mark 2013-03-27, 18:46
Copy link to this message
Re: Sorting/Partitioning of Pig output
as far as when the storefunc works, it depends on whether the job is map
only or map/reduce. It'll work on the last phase. Generally this is the
reduce phase.

As far as how pig knows where to send it's output, there are keys in pig.
Basically, a reduce job is necessary any time you have a group, join, or
sort. In the case of a group or join, the key is the group key and the join
key, respectively. In the case of a sort it is more complicated.
2013/3/27 Mark <[EMAIL PROTECTED]>

> I understand in the traditional map/reduce paradigm that each key will get
> sent to the same reducer sorted but in pig there is no such thing as a
> "key".  I'm curious to know how pig knows to which reducer to send its
> output to?
> So when creating a custom StoreFunc is there any guarentee on the ordering
> of Tuples that come into putNext?
> And another even more basic question. Do StoreFuncs operate at the Map
> phase or Reduce phase?
> Thanks
Yen SYU 2013-03-28, 16:23