Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Which FileInputFormat to use for fixed length records?


Copy link to this message
-
Re: Which FileInputFormat to use for fixed length records?
I think these would be good to add to mapreduce in the
{{org.apache.hadoop.mapreduce.lib.input}} package. Please file a JIRA and
apply a patch!
- Aaron

On Wed, Oct 28, 2009 at 11:15 AM, yz5od2 <[EMAIL PROTECTED]>wrote:

> Hi all,
> I am working on writing a FixedLengthInputFormat class and a corresponding
> FixedLengthRecordReader.
>
> Would the Hadoop commons project have interest in these? Basically these
> are for reading inputs of textual record data, where each record is a fixed
> length, (no carriage returns or separators etc)
>
> thanks
>
>
>
> On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:
>
>  You'll need to write your own, I'm afraid. You should subclass
>> FileInputFormat and go from there. You may want to look at TextInputFormat
>> /
>> LineRecordReader for an example of how an IF/RR gets put together, but
>> there
>> isn't an existing fixed-len record reader.
>>
>> - Aaron
>>
>> On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <[EMAIL PROTECTED]
>> >wrote:
>>
>>  Hi,
>>> I have input files, that contain NO carriage returns/line feeds. Each
>>> record is a fixed length (i.e. 202 bytes).
>>>
>>> Which FileInputFormat should I be using? so that each call to my Mapper
>>> receives one K,V pair, where the KEY is null or something (I don't care)
>>> and
>>> the VALUE is the 202 byte record?
>>>
>>> thanks!
>>>
>>>
>