-Re: Partitioned Datasets Map/Reduce
Aaron Kimball 2010-07-05, 07:51
One possibility: write out all the partition numbers (one per line) to a
single file, then use the NLineInputFormat to make each line its own map
task. Then in your mapper itself, you will get in a key of "0" or "1" or "2"
etc. Then explicitly open /dataset1/part-(n) and /dataset2/part-(n) in your
If you wanted to be more clever, it might be possible to subclass
MultiFileInputFormat to group together both datasets "file-number-wise" when
generating splits, but I don't have specific guidance here.
On Sat, Jul 3, 2010 at 9:35 AM, abc xyz <[EMAIL PROTECTED]> wrote:
> Hello everyone,
> I have written my custom partitioner for partitioning datasets. I want to
> partition two datasets using the same partitioner and then in the next
> mapreduce job, I want each mapper to handle the same partition from the
> sources and perform some function such as joining etc. How I can I ensure
> one mapper gets the split that corresponds to same partition from both the
> Any help would be highly appreciated.