Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Handling files with unclear boundaries


Copy link to this message
-
Re: Handling files with unclear boundaries
Thanku guys.

Syed : thanku for the pointer

Regards,
    Mohammad Tariq
On Mon, Aug 6, 2012 at 11:54 PM, syed kather <[EMAIL PROTECTED]> wrote:
> Hi tariq ,
>
>    Have a look on this link which can guide you ..
> There was discussion happen previously for the same type of issue
>
> search-hadoop.com/m/ydCoSysmTd1
>
> Syed Abdul kather
> send from Samsung S3
>
> On Aug 6, 2012 11:48 PM, "Manoj Khangaonkar" <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> I think you might need to extend FileInputFormat ( or one of its
>> derived classes)  as well as
>> implement a RecordReader.
>>
>> regards
>>
>> On Mon, Aug 6, 2012 at 8:30 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> > Hello list,
>> >
>> >      I need some guidance on how to handle files where we don't have
>> > any proper delimiters or record boundaries. Actually I am trying to
>> > process a set of file that are totally alien to me (SAS XPT files)
>> > through MR. But one thing that is always fixed is that each time I
>> > have to read 107 bytes from the line. Is it possible to use this
>> > length as a delimiter for creating splits some how??And if so which
>> > InputFormat would be appropriate??Many thanks.
>> >
>> > Regards,
>> >     Mohammad Tariq
>>
>>
>>
>> --
>> http://khangaonkar.blogspot.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB