|
|
Udaya Lakshmi 2010-01-28, 09:01
Hi all.. I have searched the documentation but could not find a input file format which will give line number as the key and line as the value. Did I miss something? Can someone give me a clue of how to implement one such input file format.
Thanks, Udaya.
Edward Capriolo 2010-01-28, 15:35
On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <[EMAIL PROTECTED]> wrote: > Hi all.. > I have searched the documentation but could not find a input file > format which will give line number as the key and line as the value. > Did I miss something? Can someone give me a clue of how to implement > one such input file format. > > Thanks, > Udaya. > Udaya,
When using the standard File Input Format:
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
key represents the byte offset of the key in the input file. There is no easy way for translate the byte offset to a logical line number, unless all lines were fixed width (not usually the case)
Edward
Jeff Zhang 2010-01-29, 00:40
I'm afraid you have to write your own InputFormat if you really want to make the line number as the key. And I believe you can reuse most of the code of TextInputFormat, since your InputFormat is almost the same as TextInputFormat except the key.
On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <[EMAIL PROTECTED]> wrote: > > Hi all.. > > I have searched the documentation but could not find a input file > > format which will give line number as the key and line as the value. > > Did I miss something? Can someone give me a clue of how to implement > > one such input file format. > > > > Thanks, > > Udaya. > > > > > Udaya, > > When using the standard File Input Format: > > public void map(LongWritable key, Text value, OutputCollector<Text, > IntWritable> output, Reporter reporter) throws IOException { > > key represents the byte offset of the key in the input file. There is > no easy way for translate the byte offset to a logical line number, > unless all lines were fixed width (not usually the case) > > Edward >
-- Best Regards
Jeff Zhang
Jeff Zhang 2010-01-29, 01:54
Sorry for my mistake, the idea of writing your own InputFormat seems not a good idea. The cost of getting the line number of each split is a little high.
On Fri, Jan 29, 2010 at 8:40 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
> I'm afraid you have to write your own InputFormat if you really want to > make the line number as the key. > And I believe you can reuse most of the code of TextInputFormat, since your > InputFormat is almost the same as TextInputFormat except the key. > > > > > On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > >> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <[EMAIL PROTECTED]> >> wrote: >> > Hi all.. >> > I have searched the documentation but could not find a input file >> > format which will give line number as the key and line as the value. >> > Did I miss something? Can someone give me a clue of how to implement >> > one such input file format. >> > >> > Thanks, >> > Udaya. >> > >> >> >> Udaya, >> >> When using the standard File Input Format: >> >> public void map(LongWritable key, Text value, OutputCollector<Text, >> IntWritable> output, Reporter reporter) throws IOException { >> >> key represents the byte offset of the key in the input file. There is >> no easy way for translate the byte offset to a logical line number, >> unless all lines were fixed width (not usually the case) >> >> Edward >> > > > > -- > Best Regards > > Jeff Zhang >
-- Best Regards
Jeff Zhang
Udaya Lakshmi 2010-01-29, 03:29
Thank you Jeff.
On 1/29/10, Jeff Zhang <[EMAIL PROTECTED]> wrote: > Sorry for my mistake, the idea of writing your own InputFormat seems not a > good idea. The cost of getting the line number of each split is a little > high. > > > > On Fri, Jan 29, 2010 at 8:40 AM, Jeff Zhang <[EMAIL PROTECTED]> wrote: > >> I'm afraid you have to write your own InputFormat if you really want to >> make the line number as the key. >> And I believe you can reuse most of the code of TextInputFormat, since >> your >> InputFormat is almost the same as TextInputFormat except the key. >> >> >> >> >> On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo >> <[EMAIL PROTECTED]>wrote: >> >>> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <[EMAIL PROTECTED]> >>> wrote: >>> > Hi all.. >>> > I have searched the documentation but could not find a input file >>> > format which will give line number as the key and line as the value. >>> > Did I miss something? Can someone give me a clue of how to implement >>> > one such input file format. >>> > >>> > Thanks, >>> > Udaya. >>> > >>> >>> >>> Udaya, >>> >>> When using the standard File Input Format: >>> >>> public void map(LongWritable key, Text value, OutputCollector<Text, >>> IntWritable> output, Reporter reporter) throws IOException { >>> >>> key represents the byte offset of the key in the input file. There is >>> no easy way for translate the byte offset to a logical line number, >>> unless all lines were fixed width (not usually the case) >>> >>> Edward >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > Best Regards > > Jeff Zhang >
|
|