Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Input splits for sequence file input


Copy link to this message
-
Re: Input splits for sequence file input
Jay Vyas 2012-12-03, 05:52
This question is fundamentally flawed : it assumes that a mapper will ask for anything.

The mapper class "run" method reads from a record reader.  The question you really should ask is :

How does a RecordReader read records across block boundaries?

Jay Vyas
http://jayunit100.blogspot.com

On Dec 2, 2012, at 9:08 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote:

> method createRecordReader will handle the record boundary issue. You can check the code for details
>
> On Mon, Dec 3, 2012 at 6:03 AM, Jeff LI <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> I was reading on the relationship between input splits and HDFS blocks and a question came up to me:
>>
>> If a logical record crosses HDFS block boundary, let's say block#1 and block#2, does the mapper assigned with this input split asks for (1) both blocks, or (2) block#1 and just the part of block#2 that this logical record extends to, or (3) block#1 and part of block#2 up to some sync point that covers this particular logical record?  Note the input is sequence file.
>>
>> I guess my question really is: does Hadoop operate on a block basis or does it respect some sort of logical structure within a block when it's trying to feed the mappers with input data.
>>
>> Cheers
>>
>> Jeff
>
>
>
> --
> Best Regards
>
> Jeff Zhang