Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> DISTINCT and paritioner


Copy link to this message
-
Re: DISTINCT and paritioner
I forgot to PS my (*).

(*) For JOIN, my test was basically:
JOIN A by $0, B by $0
And my system.out showed K = $0, and V = A less $0 (or B less $0).  E.g. if
A = (1,2,3), then K = 1, and V = (2,3)

For GROUP:
GROUP A by $0
Showed K = $0, V = A less $0.  E.g. if A = (1,2,3), then K=1 and V = (2,3)
On Wed, Jul 17, 2013 at 2:27 PM, William Oberman
<[EMAIL PROTECTED]>wrote:

> The docs say DISTINCT can take a custom partitioner.  How does that work?
>  What is "K" and "V"?
> I'm having some doubts the docs are correct.  I wrote a test partitioner
> that does a System.out of K and V.  I then wrote simple scripts to do JOIN,
> GROUP and DISTINCT.  For JOIN and GROUP I see my system.outs(*).  For
> DISTINCT, I see nothing....
>
> Using 0.11.1.
>
> will
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB