Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Input splits for sequence file input


Copy link to this message
-
Re: Input splits for sequence file input
This question is fundamentally flawed : it assumes that a mapper will ask for anything.

The mapper class "run" method reads from a record reader.  The question you really should ask is :

How does a RecordReader read records across block boundaries?

Jay Vyas
http://jayunit100.blogspot.com

On Dec 2, 2012, at 9:08 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote:

> method createRecordReader will handle the record boundary issue. You can check the code for details
>
> On Mon, Dec 3, 2012 at 6:03 AM, Jeff LI <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> I was reading on the relationship between input splits and HDFS blocks and a question came up to me:
>>
>> If a logical record crosses HDFS block boundary, let's say block#1 and block#2, does the mapper assigned with this input split asks for (1) both blocks, or (2) block#1 and just the part of block#2 that this logical record extends to, or (3) block#1 and part of block#2 up to some sync point that covers this particular logical record?  Note the input is sequence file.
>>
>> I guess my question really is: does Hadoop operate on a block basis or does it respect some sort of logical structure within a block when it's trying to feed the mappers with input data.
>>
>> Cheers
>>
>> Jeff
>
>
>
> --
> Best Regards
>
> Jeff Zhang
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB