Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> What if file format is dependent upon first few lines?


Copy link to this message
-
Re: What if file format is dependent upon first few lines?
A mapper's record reader implementation need not be restricted to
strictly only the input split boundary. It is a loose relationship -
you can always seek(0), read the lines you need to prepare, then
seek(offset) and continue reading.

Apache Avro (http://avro.apache.org) has a similar format - header
contains the schema a reader needs to work.

On Thu, Feb 27, 2014 at 1:59 AM, Fengyun RAO <[EMAIL PROTECTED]> wrote:

Harsh J