|
|
-
Re: Input splits for sequence file inputJeff Zhang 2012-12-03, 02:08
method createRecordReader will handle the record boundary issue. You can
check the code for details On Mon, Dec 3, 2012 at 6:03 AM, Jeff LI <[EMAIL PROTECTED]> wrote: > Hello, > > I was reading on the relationship between input splits and HDFS blocks and > a question came up to me: > > If a logical record crosses HDFS block boundary, let's say block#1 and > block#2, does the mapper assigned with this input split asks for (1) both > blocks, or (2) block#1 and just the part of block#2 that this logical > record extends to, or (3) block#1 and part of block#2 up to some sync point > that covers this particular logical record? Note the input is sequence > file. > > I guess my question really is: does Hadoop operate on a block basis or > does it respect some sort of logical structure within a block when it's > trying to feed the mappers with input data. > > Cheers > > Jeff > > -- Best Regards Jeff Zhang |