Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Pairwise Comparison of Large Datasets


Copy link to this message
-
Pairwise Comparison of Large Datasets
Happy New Year :)

Thought some of you might find this useful.

We've developed an approach to doing pairwise comparisons on large datasets
that doesn't require visibility of the whole dataset at any time. The
approach brings together pairs for comparison using incrementing
coordinates to target a value at a cell.

http://dynamicorange.com/2012/12/31/pairwise-comparisons-of-large-datasets/

There is still work to do on making the approach more efficient and trying
to eliminate a pre-processing step. Help gratefully received.

If there's a toolset already out there for doing this I'd be happy to hear
about that too!

thanks

rob
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB