Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Pairwise Comparison of Large Datasets

Copy link to this message
Pairwise Comparison of Large Datasets
Happy New Year :)

Thought some of you might find this useful.

We've developed an approach to doing pairwise comparisons on large datasets
that doesn't require visibility of the whole dataset at any time. The
approach brings together pairs for comparison using incrementing
coordinates to target a value at a cell.


There is still work to do on making the approach more efficient and trying
to eliminate a pre-processing step. Help gratefully received.

If there's a toolset already out there for doing this I'd be happy to hear
about that too!