|
|
-
Re: One file per mapper?Bejoy Ks 2012-10-08, 14:28
Hi Terry
If you are having files smaller than hdfs block size and if you are using Default TextInputFormat with the default properties for split sizes there would be just one file per mapper. If you are having larger file sizes, greater than the size of a hdfs block. Please take a look at a sample implemention of 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White. http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <[EMAIL PROTECTED]> wrote: > Hello- > > I know that it is contrary to normal Hadoop operation, but how can I > configure my M/R job to send one complete file to each mapper task? This > is intended to be used on many files in the 1.5 MB range as the first > step in a chain of processes. > > thanks. > |