|
|
-
Re: One file per mapper?Terry Healy 2012-10-08, 15:29
thanks Bejoy.
...Feeling a bit foolish as Tom White's book was 2 feet away.... On 10/08/2012 10:28 AM, Bejoy Ks wrote: > Hi Terry > > If you are having files smaller than hdfs block size and if you are > using Default TextInputFormat with the default properties for split > sizes there would be just one file per mapper. > > If you are having larger file sizes, greater than the size of a hdfs > block. Please take a look at a sample implemention of > 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White. > http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false > > > > On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Hello- > > I know that it is contrary to normal Hadoop operation, but how can I > configure my M/R job to send one complete file to each mapper task? This > is intended to be used on many files in the 1.5 MB range as the first > step in a chain of processes. > > thanks. > > -- Terry Healy / [EMAIL PROTECTED] Cyber Security Operations Brookhaven National Laboratory Building 515, Upton N.Y. 11973 |