|
|
-
Re: LineReader, Buffering for FileInputFormatSaptarshi Guha 2009-08-09, 23:43
Thank you. Is 64KB a good choice? From experience, there is a payoff between
large chunks and time taken to read the chunk. I wonder if a larger value would be better. On Sun, Aug 9, 2009 at 7:41 PM, Harold Valdivia Garcia < [EMAIL PROTECTED]> wrote: > You can see this two files: > > > http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java?revision=796148 > > > http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/util/LineReader.java?revision=786726 > > I think It doesnt access the disk every time it read a line. > > LineReader read 64k bytes into a buffer, and then try to parse the data in > lines. > > > > > On Sun, Aug 9, 2009 at 6:38 PM, Saptarshi Guha <[EMAIL PROTECTED]>wrote: > >> Hello, >> I am using the TextInputFormat and its associated LineReader. In the >> RecordReader for this class, >> it reads key and value, using LineReader. >> My question is does LineReader hit the disk every time it needs to read a >> line? >> I notice it uses DataInputStream, does that do some internal buffering? >> >> I guess it would be be performance hit if LineReader read from disk every >> time it needs to fetch a line, >> so I'm guessing it reads a chunk and parses lines from the chunk, but i >> didn't see that happening. >> >> I am using Hadoop 0.20 >> >> Any comments would be appreciated. >> >> Regards >> Saptarshi >> > > > > -- > ****************************************** > Harold Dwight Valdivia Garcia > Graduate Student > M.S Computer Engineering > University of Puerto Rico, Mayaguez Campus > ****************************************** > |