Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - Partitioned Datasets Map/Reduce


Copy link to this message
-
Re: Partitioned Datasets Map/Reduce
Hemanth Yamijala 2010-07-06, 04:40
Hi,

> I have written my custom partitioner for partitioning datasets. I want  to
> partition two datasets using the same partitioner and then in the  next
> mapreduce job, I want each mapper to handle the same partition from  the two
> sources and perform some function such as joining etc. How I  can I ensure that
> one mapper gets the split that corresponds to same  partition from both the
> sources?
>

Not really an answer to your specific question, but have you taken a
look at Pig (http://hadoop.apache.org/pig) which is suitable for
operations like Joining data sets ?