Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Sorting/Partitioning of Pig output


Copy link to this message
-
Re: Sorting/Partitioning of Pig output
as far as when the storefunc works, it depends on whether the job is map
only or map/reduce. It'll work on the last phase. Generally this is the
reduce phase.

As far as how pig knows where to send it's output, there are keys in pig.
Basically, a reduce job is necessary any time you have a group, join, or
sort. In the case of a group or join, the key is the group key and the join
key, respectively. In the case of a sort it is more complicated.
2013/3/27 Mark <[EMAIL PROTECTED]>

> I understand in the traditional map/reduce paradigm that each key will get
> sent to the same reducer sorted but in pig there is no such thing as a
> "key".  I'm curious to know how pig knows to which reducer to send its
> output to?
>
> So when creating a custom StoreFunc is there any guarentee on the ordering
> of Tuples that come into putNext?
>
> And another even more basic question. Do StoreFuncs operate at the Map
> phase or Reduce phase?
>
> Thanks
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB