Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> What if file format is dependent upon first few lines?


Copy link to this message
-
Re: What if file format is dependent upon first few lines?
A mapper's record reader implementation need not be restricted to
strictly only the input split boundary. It is a loose relationship -
you can always seek(0), read the lines you need to prepare, then
seek(offset) and continue reading.

Apache Avro (http://avro.apache.org) has a similar format - header
contains the schema a reader needs to work.

On Thu, Feb 27, 2014 at 1:59 AM, Fengyun RAO <[EMAIL PROTECTED]> wrote:

Harsh J

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB