Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> DISTINCT and paritioner


Copy link to this message
-
Re: DISTINCT and paritioner
I forgot to PS my (*).

(*) For JOIN, my test was basically:
JOIN A by $0, B by $0
And my system.out showed K = $0, and V = A less $0 (or B less $0).  E.g. if
A = (1,2,3), then K = 1, and V = (2,3)

For GROUP:
GROUP A by $0
Showed K = $0, V = A less $0.  E.g. if A = (1,2,3), then K=1 and V = (2,3)
On Wed, Jul 17, 2013 at 2:27 PM, William Oberman
<[EMAIL PROTECTED]>wrote:

> The docs say DISTINCT can take a custom partitioner.  How does that work?
>  What is "K" and "V"?
> I'm having some doubts the docs are correct.  I wrote a test partitioner
> that does a System.out of K and V.  I then wrote simple scripts to do JOIN,
> GROUP and DISTINCT.  For JOIN and GROUP I see my system.outs(*).  For
> DISTINCT, I see nothing....
>
> Using 0.11.1.
>
> will
>