Thanks for the response.
>For example TextInputFormat will read the last line of the FileSplit past
the split boundary and when reading other than the first FileSplit,
TextInputFormat ignores the content up to the first newline.
When the first record in the splits other than the first split is
completeand not spanning
boundaries, then based on the above logic this particular record is not
processed by the mapper.
Cloudera Certified Developer for Apache Hadoop CDH4 (95%)
If you aren’t taking advantage of big data, then you don’t have big data,
you have just a pile of data.
On Fri, Jan 25, 2013 at 12:52 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Praveen,
> This is explained at http://wiki.apache.org/hadoop/HadoopMapReduce
> [Map section].
> On Thu, Jan 24, 2013 at 10:20 PM, Praveen Sripati
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> > HDFS splits the file across record boundaries. So, how does the mapper
> > processing the second block (b2) determine that the first record is
> > incomplete and should process starting from the second record in the
> > (b2)?
> > Thanks,
> > Praveen
> Harsh J