|
|
-
Re: One file per mapperEdward Capriolo 2011-07-06, 14:46
On Tue, Jul 5, 2011 at 5:28 PM, Jim Falgout <[EMAIL PROTECTED]>wrote:
> I've done this before by placing the name of each file to process into a > single file (newline separated) and using the NLineInputFormat class as the > input format. Run your job with the single file with all of the file names > to process as the input. Each mapper will then be handed one line (this is > tunable) from the single input file. The line will contain the name of the > file to process. > > You can also write your own InputFormat class that creates a split for each > file. > > Both of these options have scalability issues which begs the question: why > one file per mapper? > > -----Original Message----- > From: Govind Kothari [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 05, 2011 3:04 PM > To: [EMAIL PROTECTED] > Subject: One file per mapper > > Hi, > > I am new to hadoop. I have a set of files and I want to assign each file to > a mapper. Also in mapper there should be a way to know the complete path of > the file. Can you please tell me how to do that ? > > Thanks, > Govind > > -- > Govind Kothari > Graduate Student > Dept. of Computer Science > University of Maryland College Park > > <---Seek Excellence, Success will Follow ---> > > You can also do this with MultipleInputs and MultipleOutputs classes. Each source file can have a different mapper. |