Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: computing pairwise document similarity

Copy link to this message
Re: computing pairwise document similarity
Did you ever figure this out? I am running into the same issue and the
responses on this thread have been of no help to me. If you could share the
actual command you used to get this done, it would be greatly appreciated.
I've provided the skeleton of the code below and am missing the one command.

> DUMP docMat;

doc1 a 2
doc1 b 2
doc2 b 2

> docGrp = GROUP docMat BY word;
> dump docGrp;



> DUMP res;

doc1, doc1, 4
doc1, doc3, 2
doc1, doc4, 2