Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Fileformat query


Copy link to this message
-
Re: Fileformat query
Sorry for my mistake, the idea of writing your own InputFormat seems not a
good idea. The cost of getting the line number of each split is a little
high.

On Fri, Jan 29, 2010 at 8:40 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote:

> I'm afraid you have to write your own InputFormat if you really want to
> make the line number as the key.
> And I believe you can reuse most of the code of TextInputFormat, since your
> InputFormat is almost the same as TextInputFormat except the key.
>
>
>
>
> On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
>
>> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <[EMAIL PROTECTED]>
>> wrote:
>> > Hi all..
>> >   I have searched the documentation but could not find a input file
>> > format which will give line number as the key and line as the value.
>> > Did I miss something? Can someone give me a clue of how to implement
>> > one such input file format.
>> >
>> > Thanks,
>> > Udaya.
>> >
>>
>>
>> Udaya,
>>
>> When using the standard File Input Format:
>>
>> public void map(LongWritable key, Text value, OutputCollector<Text,
>> IntWritable> output, Reporter reporter) throws IOException {
>>
>> key represents the byte offset of the key in the input file. There is
>> no easy way for translate the byte offset to a logical line number,
>> unless all lines were fixed width (not usually the case)
>>
>> Edward
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

--
Best Regards

Jeff Zhang