-Calling sync for every record in sequencefile.writer
Saptarshi Guha 2012-11-12, 18:26
For a given part file (e..g part-m-0000), i would like to record the
position of key written to this file.
To get this position, i wrote something
where out is SequenceFile.Writer
Now, if I leave the first line uncommented, for small files, getLength()
does not change from key to key.
if i call sync, for every key, it changes to accurately reflect the
Is there some other function i can use to get the current position (like a
file's 'tell' function)
But calling sync for every record would be costly?
How much?(I dont expect an answer to the last question).
if it makes a difference i have block compression turned on.
I noticed that Mapfile.writer does something similar(calls getLength) and
would reduce to the above operation i.e. call getLength for every key-value
pair if i set the index to 1. So would this impact Mapfile.writer?