Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: One file per mapper?


Copy link to this message
-
Re: One file per mapper?
Hi Terry

If you are having files smaller than hdfs block size and if you are using
Default TextInputFormat with the default properties for split sizes there
would be just one file per mapper.

If you are having larger file sizes, greater than the size of a hdfs block.
Please take a look at a sample implemention of 'WholeFileInputFormat' from
'Hadoop - The Definitive Guide' by Tom White.
http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false

On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <[EMAIL PROTECTED]> wrote:

> Hello-
>
> I know that it is contrary to normal Hadoop operation, but how can I
> configure my M/R job to send one complete file to each mapper task? This
> is intended to be used on many files in the 1.5 MB range as the first
> step in a chain of processes.
>
> thanks.
>