Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> DISTINCT and paritioner


Copy link to this message
-
Re: DISTINCT and paritioner
You're correct.  It looks like an optimization was put in to make distinct use a special partitioner which prevents the user from setting the partitioner.  Could you file a JIRA against the docs so we can get that fixed?

Alan.

On Jul 17, 2013, at 11:27 AM, William Oberman wrote:

> The docs say DISTINCT can take a custom partitioner.  How does that work?
> What is "K" and "V"?
> I'm having some doubts the docs are correct.  I wrote a test partitioner
> that does a System.out of K and V.  I then wrote simple scripts to do JOIN,
> GROUP and DISTINCT.  For JOIN and GROUP I see my system.outs(*).  For
> DISTINCT, I see nothing....
>
> Using 0.11.1.
>
> will
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB