Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> One file per mapper


Copy link to this message
-
Re: One file per mapper
On Tue, Jul 5, 2011 at 5:28 PM, Jim Falgout <[EMAIL PROTECTED]>wrote:

> I've done this before by placing the name of each file to process into a
> single file (newline separated) and using the NLineInputFormat class as the
> input format. Run your job with the single file with all of the file names
> to process as the input. Each mapper will then be handed one line (this is
> tunable) from the single input file. The line will contain the name of the
> file to process.
>
> You can also write your own InputFormat class that creates a split for each
> file.
>
> Both of these options have scalability issues which begs the question: why
> one file per mapper?
>
> -----Original Message-----
> From: Govind Kothari [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, July 05, 2011 3:04 PM
> To: [EMAIL PROTECTED]
> Subject: One file per mapper
>
> Hi,
>
> I am new to hadoop. I have a set of files and I want to assign each file to
> a mapper. Also in mapper there should be a way to know the complete path of
> the file. Can you please tell me how to do that ?
>
> Thanks,
> Govind
>
> --
> Govind Kothari
> Graduate Student
> Dept. of Computer Science
> University of Maryland College Park
>
> <---Seek Excellence, Success will Follow --->
>
>
You can also do this with MultipleInputs and MultipleOutputs classes. Each
source file can have a different mapper.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB