Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Using a custom FileSplitter?


Copy link to this message
-
Using a custom FileSplitter?
Assume I have one of the two situations (I have both)
1) I have a directory with several hundred files - of these some fraction
need to be passed to the mapper (say the ones ending in ".foo") and the
others
   can be ignored. Assume I am incapable or unwilling to create a directory
containing only the files that I need - how do I set up a custom file
splitter using Java code
   to filter my files.

2) Assume I have a collection of files which are not splittable so I will
use one file per mapper. Assume that special code is required to read the
file and convert it into lines of
 text and that I have Java code to do that. Same question - how do I install
a custom file splitter to decode files in a custom manner?

--
Steven M. Lewis PhD
Institute for Systems Biology
Seattle WA
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB