Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: Input splits for sequence file input


Copy link to this message
-
Re: Input splits for sequence file input
Mahesh Balija 2012-12-03, 06:51
Hi Jeff,

            Beyond the hdfs blocks, there is something called as *
InputSplit/FileSplit* (in your terms logical structure).
            Mapper operates on InputSplits using the RecordReader and this
RecordReader is specific to InputFormat.
            InputFormat parses the input and generates key-value pairs.

            InputFormat also handle records that may be split on the
FileSplit boundary (i.e., different blocks).

            Please check this link for more information,
http://wiki.apache.org/hadoop/HadoopMapReduce

Best,
Mahesh Balija,
Calsoft Labs.

On Mon, Dec 3, 2012 at 3:33 AM, Jeff LI <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I was reading on the relationship between input splits and HDFS blocks and
> a question came up to me:
>
> If a logical record crosses HDFS block boundary, let's say block#1 and
> block#2, does the mapper assigned with this input split asks for (1) both
> blocks, or (2) block#1 and just the part of block#2 that this logical
> record extends to, or (3) block#1 and part of block#2 up to some sync point
> that covers this particular logical record?  Note the input is sequence
> file.
>
> I guess my question really is: does Hadoop operate on a block basis or
> does it respect some sort of logical structure within a block when it's
> trying to feed the mappers with input data.
>
> Cheers
>
> Jeff
>
>
+
Jeff Zhang 2012-12-03, 02:08