Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Handling files with unclear boundaries


Copy link to this message
-
Handling files with unclear boundaries
Hello list,

     I need some guidance on how to handle files where we don't have
any proper delimiters or record boundaries. Actually I am trying to
process a set of file that are totally alien to me (SAS XPT files)
through MR. But one thing that is always fixed is that each time I
have to read 107 bytes from the line. Is it possible to use this
length as a delimiter for creating splits some how??And if so which
InputFormat would be appropriate??Many thanks.

Regards,
    Mohammad Tariq
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB