Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How does mapper process partial records?

Copy link to this message
Re: How does mapper process partial records?

Thanks for the response.

>From http://wiki.apache.org/hadoop/HadoopMapReduce

>For example TextInputFormat will read the last line of the FileSplit past
the split boundary and when reading other than the first FileSplit,
TextInputFormat ignores the content up to the first newline.

When the first record in the splits other than the first split is
completeand not spanning
boundaries, then based on the above logic this particular record is not
processed by the mapper.

Cloudera Certified Developer for Apache Hadoop CDH4 (95%)

If you aren’t taking advantage of big data, then you don’t have big data,
you have just a pile of data.
On Fri, Jan 25, 2013 at 12:52 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hi Praveen,
> This is explained at http://wiki.apache.org/hadoop/HadoopMapReduce
> [Map section].
> On Thu, Jan 24, 2013 at 10:20 PM, Praveen Sripati
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > HDFS splits the file across record boundaries. So, how does the mapper
> > processing the second block (b2) determine that the first record is
> > incomplete and should process starting from the second record in the
> block
> > (b2)?
> >
> > Thanks,
> > Praveen
> --
> Harsh J