Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Cartesian product in hadoop


Copy link to this message
-
Cartesian product in hadoop
zheyi rong 2013-04-18, 09:47
Dear all,

I am writing to kindly ask for ideas of doing cartesian product in hadoop.
Specifically, now I have two datasets, each of which contains 20million
lines.
I want to do cartesian product on these two datasets, comparing lines
pairwisely.

The output of each comparison can be mostly filtered by a function ( we do
not output the
whole result of this cartesian product, but only a small part).

I guess one good way is to pass one block from dataset1 and another block
from dataset2
to a mapper, then let the mappers do the product in memory to avoid IO.

Any suggestions?
Thank you very much.

Regards,
Zheyi Rong
+
Jagat Singh 2013-04-18, 09:58
+
Azuryy Yu 2013-04-18, 10:21
+
Ajay Srivastava 2013-04-18, 10:50
+
zheyi rong 2013-04-18, 11:12