mapper should produce (k,1), (1, v) for lines k,v in file1 and should
produce (k,2), (2,v) for lines k,v in file2. Your partition function should
look at only the first member of the key tuple, but should order on both
Your reducer will get data like this:
or like this
(k, 1), [(1,v1),(2,v2)]
In the first case, it should emit k, v. In the second, k,v2. More simply,
it should simply emit the last value in the reduce group.
In actual practice, you should probably use something fancier than an
integer to tag the data. You will also have to find some kind of
appropriate tuple structure.
Pig, Cascading, Plume and Hive would make this easier than straight Java,
but all techniques would work.
On Tue, Feb 8, 2011 at 4:26 PM, Gururaj S Mayya <[EMAIL PROTECTED]> wrote:
> Any pointers as to how this could be done?