Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Pairwise Comparison of Large Datasets


Copy link to this message
-
Pairwise Comparison of Large Datasets
Rob Styles 2012-12-31, 18:42
Happy New Year :)

Thought some of you might find this useful.

We've developed an approach to doing pairwise comparisons on large datasets
that doesn't require visibility of the whole dataset at any time. The
approach brings together pairs for comparison using incrementing
coordinates to target a value at a cell.

http://dynamicorange.com/2012/12/31/pairwise-comparisons-of-large-datasets/

There is still work to do on making the approach more efficient and trying
to eliminate a pre-processing step. Help gratefully received.

If there's a toolset already out there for doing this I'd be happy to hear
about that too!

thanks

rob