Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Handling files with unclear boundaries


Copy link to this message
-
Handling files with unclear boundaries
Hello list,

     I need some guidance on how to handle files where we don't have
any proper delimiters or record boundaries. Actually I am trying to
process a set of file that are totally alien to me (SAS XPT files)
through MR. But one thing that is always fixed is that each time I
have to read 107 bytes from the line. Is it possible to use this
length as a delimiter for creating splits some how??And if so which
InputFormat would be appropriate??Many thanks.

Regards,
    Mohammad Tariq