Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Re: computing pairwise document similarity


Copy link to this message
-
Re: computing pairwise document similarity
Did you ever figure this out? I am running into the same issue and the
responses on this thread have been of no help to me. If you could share the
actual command you used to get this done, it would be greatly appreciated.
I've provided the skeleton of the code below and am missing the one command.

> DUMP docMat;

doc1 a 2
doc1 b 2
doc2 b 2
...

> docGrp = GROUP docMat BY word;
> dump docGrp;

(a,{(doc1,a,2),(doc3,a,1),(doc4,a,1)})
(b,{(doc1,b,2),(doc2,b,2),(doc3,b,1)})
...

> INSERT STATEMENT HERE

> DUMP res;

doc1, doc1, 4
doc1, doc3, 2
doc1, doc4, 2
...

Thanks,
Sergey
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB