Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop job using multiple input files

Copy link to this message
Re: Hadoop job using multiple input files
Hey Amandeep,

You can get the file name for a task via the "map.input.file" property. For
the join you're doing, you could inspect this property and ouput (number,
name) and (number, address) as your (key, value) pairs, depending on the
file you're working with. Then you can do the combination in your reducer.

You could also check out the join package in contrib/utils (
but I'd say your job is simple enough that you'll get it done faster with
the above method.

This task would be a simple join in Hive, so you could consider using Hive
to manage the data and perform the join.


On Fri, Feb 6, 2009 at 1:34 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:

> Is it possible to write a map reduce job using multiple input files?
> For example:
> File 1 has data like - Name, Number
> File 2 has data like - Number, Address
> Using these, I want to create a third file which has something like - Name,
> Address
> How can a map reduce job be written to do this?
> Amandeep
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz