Amandeep Khurana 2009-02-06, 09:34
-Re: Hadoop job using multiple input files
Jeff Hammerbacher 2009-02-06, 09:55
You can get the file name for a task via the "map.input.file" property. For
the join you're doing, you could inspect this property and ouput (number,
name) and (number, address) as your (key, value) pairs, depending on the
file you're working with. Then you can do the combination in your reducer.
You could also check out the join package in contrib/utils (
but I'd say your job is simple enough that you'll get it done faster with
the above method.
This task would be a simple join in Hive, so you could consider using Hive
to manage the data and perform the join.
On Fri, Feb 6, 2009 at 1:34 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> Is it possible to write a map reduce job using multiple input files?
> For example:
> File 1 has data like - Name, Number
> File 2 has data like - Number, Address
> Using these, I want to create a third file which has something like - Name,
> How can a map reduce job be written to do this?
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
Amandeep Khurana 2009-02-06, 10:17
Jeff Hammerbacher 2009-02-06, 13:22
Amandeep Khurana 2009-02-06, 22:58
Amandeep Khurana 2009-02-07, 00:46
Billy Pearson 2009-02-07, 05:32
Ian Soboroff 2009-02-06, 13:31