Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Fileformat query


Copy link to this message
-
Re: Fileformat query
Jeff Zhang 2010-01-29, 00:40
I'm afraid you have to write your own InputFormat if you really want to make
the line number as the key.
And I believe you can reuse most of the code of TextInputFormat, since your
InputFormat is almost the same as TextInputFormat except the key.

On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <[EMAIL PROTECTED]> wrote:
> > Hi all..
> >   I have searched the documentation but could not find a input file
> > format which will give line number as the key and line as the value.
> > Did I miss something? Can someone give me a clue of how to implement
> > one such input file format.
> >
> > Thanks,
> > Udaya.
> >
>
>
> Udaya,
>
> When using the standard File Input Format:
>
> public void map(LongWritable key, Text value, OutputCollector<Text,
> IntWritable> output, Reporter reporter) throws IOException {
>
> key represents the byte offset of the key in the input file. There is
> no easy way for translate the byte offset to a logical line number,
> unless all lines were fixed width (not usually the case)
>
> Edward
>

--
Best Regards

Jeff Zhang