Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - LineReader, Buffering for FileInputFormat


Copy link to this message
-
Re: LineReader, Buffering for FileInputFormat
Saptarshi Guha 2009-08-09, 23:43
Thank you. Is 64KB a good choice? From experience, there is a payoff between
large chunks and time taken to read the chunk.
I wonder if a larger value would be better.

On Sun, Aug 9, 2009 at 7:41 PM, Harold Valdivia Garcia <
[EMAIL PROTECTED]> wrote:

> You can see this two files:
>
>
> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/LineRecordReader.java?revision=796148
>
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/src/java/org/apache/hadoop/util/LineReader.java?revision=786726
>
> I think It doesnt access the disk every time it read a line.
>
> LineReader read 64k bytes  into a buffer, and then try to parse the data in
> lines.
>
>
>
>
> On Sun, Aug 9, 2009 at 6:38 PM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>> I am using the TextInputFormat and its associated LineReader. In the
>> RecordReader for this class,
>> it reads key and value, using LineReader.
>> My question is does LineReader hit the disk every time it needs to read a
>> line?
>> I notice it uses DataInputStream, does that do some internal buffering?
>>
>> I guess it would be be performance hit if LineReader read from disk every
>> time it needs to fetch a line,
>> so I'm guessing it reads a chunk and parses lines from the chunk, but i
>> didn't see that happening.
>>
>> I am using Hadoop 0.20
>>
>> Any comments would be appreciated.
>>
>> Regards
>> Saptarshi
>>
>
>
>
> --
> ******************************************
> Harold Dwight Valdivia Garcia
> Graduate Student
> M.S Computer Engineering
> University of Puerto Rico, Mayaguez Campus
> ******************************************
>