Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Cartesian product in hadoop


Copy link to this message
-
Cartesian product in hadoop
Dear all,

I am writing to kindly ask for ideas of doing cartesian product in hadoop.
Specifically, now I have two datasets, each of which contains 20million
lines.
I want to do cartesian product on these two datasets, comparing lines
pairwisely.

The output of each comparison can be mostly filtered by a function ( we do
not output the
whole result of this cartesian product, but only a small part).

I guess one good way is to pass one block from dataset1 and another block
from dataset2
to a mapper, then let the mappers do the product in memory to avoid IO.

Any suggestions?
Thank you very much.

Regards,
Zheyi Rong
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB