Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> What if file format is dependent upon first few lines?


+
Fengyun RAO 2014-02-27, 10:00
+
java8964 2014-02-27, 14:17
+
Harsh J 2014-02-27, 14:18
+
Fengyun RAO 2014-02-28, 02:09
Copy link to this message
-
Re: What if file format is dependent upon first few lines?

You could, i think, just extend fileinputformat, with isSplittable =
false.  Then each file wont be brokeen up into separate blocks, and
processed as a whole per mapper.  This is probably the easiest thing to do
but if you have huge files, it wont perform very well.
You can use Harsh's suggestion (thanks for that idea, i didnt know it).

1) In the setup method of a mapper, you can get the file path : using

((FileSplit) context.getInputSplit()).getPath();
2) Then , in the mappers "setup" method, you should be able open a file
input stream and call "seek(0)" to read the file header, as Harsh sais.

3) When you process the header, you can store the results in the Setup
method as a local variable, and the mapper can read from that variable and
proceed.
On Thu, Feb 27, 2014 at 9:09 PM, Fengyun RAO <[EMAIL PROTECTED]> wrote:

Jay Vyas
http://jayunit100.blogspot.com

 
+
Fengyun RAO 2014-02-28, 13:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB