You could, i think, just extend fileinputformat, with isSplittable =
false. Then each file wont be brokeen up into separate blocks, and
processed as a whole per mapper. This is probably the easiest thing to do
but if you have huge files, it wont perform very well.
You can use Harsh's suggestion (thanks for that idea, i didnt know it).
1) In the setup method of a mapper, you can get the file path : using
2) Then , in the mappers "setup" method, you should be able open a file
input stream and call "seek(0)" to read the file header, as Harsh sais.
3) When you process the header, you can store the results in the Setup
method as a local variable, and the mapper can read from that variable and
On Thu, Feb 27, 2014 at 9:09 PM, Fengyun RAO <[EMAIL PROTECTED]> wrote: