Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Input splits for sequence file input

Jay Vyas 2012-12-03, 05:52
Jeff LI 2012-12-02, 22:03
Copy link to this message
Re: Input splits for sequence file input
Hi Jeff,

This has been asked several times before (check out
http://search-hadoop.com please).

The answer is (3) for SequenceFiles (due to no notion of records) and
(2) as a general thought (i.e. text files, etc.).

On Mon, Dec 3, 2012 at 3:33 AM, Jeff LI <[EMAIL PROTECTED]> wrote:
> Hello,
> I was reading on the relationship between input splits and HDFS blocks and a
> question came up to me:
> If a logical record crosses HDFS block boundary, let's say block#1 and
> block#2, does the mapper assigned with this input split asks for (1) both
> blocks, or (2) block#1 and just the part of block#2 that this logical record
> extends to, or (3) block#1 and part of block#2 up to some sync point that
> covers this particular logical record?  Note the input is sequence file.
> I guess my question really is: does Hadoop operate on a block basis or does
> it respect some sort of logical structure within a block when it's trying to
> feed the mappers with input data.
> Cheers
> Jeff

Harsh J