Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> One file per mapper?

Copy link to this message
Re: One file per mapper?
thanks Bejoy.

...Feeling a bit foolish as Tom White's book was 2 feet away....

On 10/08/2012 10:28 AM, Bejoy Ks wrote:
> Hi Terry
> If you are having files smaller than hdfs block size and if you are
> using Default TextInputFormat with the default properties for split
> sizes there would be just one file per mapper.
> If you are having larger file sizes, greater than the size of a hdfs
> block. Please take a look at a sample implemention of
> 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White.
> http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false
> On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>     Hello-
>     I know that it is contrary to normal Hadoop operation, but how can I
>     configure my M/R job to send one complete file to each mapper task? This
>     is intended to be used on many files in the 1.5 MB range as the first
>     step in a chain of processes.
>     thanks.

Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973