Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> One file per mapper

Copy link to this message
Re: One file per mapper
On Tue, Jul 5, 2011 at 5:28 PM, Jim Falgout <[EMAIL PROTECTED]>wrote:

> I've done this before by placing the name of each file to process into a
> single file (newline separated) and using the NLineInputFormat class as the
> input format. Run your job with the single file with all of the file names
> to process as the input. Each mapper will then be handed one line (this is
> tunable) from the single input file. The line will contain the name of the
> file to process.
> You can also write your own InputFormat class that creates a split for each
> file.
> Both of these options have scalability issues which begs the question: why
> one file per mapper?
> -----Original Message-----
> From: Govind Kothari [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, July 05, 2011 3:04 PM
> Subject: One file per mapper
> Hi,
> I am new to hadoop. I have a set of files and I want to assign each file to
> a mapper. Also in mapper there should be a way to know the complete path of
> the file. Can you please tell me how to do that ?
> Thanks,
> Govind
> --
> Govind Kothari
> Graduate Student
> Dept. of Computer Science
> University of Maryland College Park
> <---Seek Excellence, Success will Follow --->
You can also do this with MultipleInputs and MultipleOutputs classes. Each
source file can have a different mapper.