I met a interesting problem when I implement my own custom InputFormat
which extends the FileInputFormat.(I rewrite the RecordReader class but not
the InputSplit class)
My recordreader will take following format as a basic record: (my
recordreader extends the LineRecordReader. It returns a record if it meets
#Trailer# and contains #Header#. I only have one input file that is
composed of many of following basic record)
.....(many lines, may be 0 lines or 1000 lines, it varies)
Everything works fine if above basic input unit in a file is integer times
of mapper. For example, I use 2 mappers and there are two basic records in
my input file. Or I use 3 mappers and there are 6 basic units in the input
However, if I use 4 mappers but there are 3 basic units in the input
file(not integer times). The final output is incorrect. The "Map Input
Bytes" in the job counter is also less than the input file size. How can I
fix it? Do I need to rewrite the inputSplit?
Any reply will be appreciated!