Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Fileformat query


Copy link to this message
-
Re: Fileformat query
I'm afraid you have to write your own InputFormat if you really want to make
the line number as the key.
And I believe you can reuse most of the code of TextInputFormat, since your
InputFormat is almost the same as TextInputFormat except the key.

On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <[EMAIL PROTECTED]> wrote:
> > Hi all..
> >   I have searched the documentation but could not find a input file
> > format which will give line number as the key and line as the value.
> > Did I miss something? Can someone give me a clue of how to implement
> > one such input file format.
> >
> > Thanks,
> > Udaya.
> >
>
>
> Udaya,
>
> When using the standard File Input Format:
>
> public void map(LongWritable key, Text value, OutputCollector<Text,
> IntWritable> output, Reporter reporter) throws IOException {
>
> key represents the byte offset of the key in the input file. There is
> no easy way for translate the byte offset to a logical line number,
> unless all lines were fixed width (not usually the case)
>
> Edward
>

--
Best Regards

Jeff Zhang
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB