Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Handling files with unclear boundaries

Copy link to this message
Handling files with unclear boundaries
Hello list,

     I need some guidance on how to handle files where we don't have
any proper delimiters or record boundaries. Actually I am trying to
process a set of file that are totally alien to me (SAS XPT files)
through MR. But one thing that is always fixed is that each time I
have to read 107 bytes from the line. Is it possible to use this
length as a delimiter for creating splits some how??And if so which
InputFormat would be appropriate??Many thanks.

    Mohammad Tariq
Manoj Khangaonkar 2012-08-06, 18:18
syed kather 2012-08-06, 18:24
Mohammad Tariq 2012-08-06, 19:22
rahul p 2012-08-06, 15:45