-RE: how to specify key and value for an input to mapreduce job
Vinayakumar B 2012-02-14, 15:28
1. To read the input which have both key and value in text format you can use
KeyValueTextInputFormat inside org.apache.hadoop.mapreduce.lib.input package as InputFormat class to your Job. This Input format will have KeyValueLineRecordReader which will read the line and separate the key and value present in the same line.
Here you need to set the keyValue separator using following configuration in the job configuration.
Be default this will be '\t'.
2. Reduce output will be default TextOutputFormat with LongWritable key and Text value.
In Your case u need to have Text as both Key and Value.
Since you were using default TextInputFormat, u were getting complete line as the Value and the position as the key. Now if you use KeyValueTextInputFormat you will get the desired result.
Thanks and Regards,
From: Vamshi Krishna [[EMAIL PROTECTED]]
Sent: Tuesday, February 14, 2012 8:28 PM
To: [EMAIL PROTECTED]
Subject: how to specify key and value for an input to mapreduce job
i have a job which read all the rows from a hbase table and had written them to a location in dfs i.e /user/HSOP. HSOP is a folder which has 9 files each having their content as
Both 1st and second columns are of Text type as specified in the first job's outputformat class.
Now i want onemore job to read all these files as input and and treat first column element as "key" and second column element as "value". For that i tried starting one job by specifying line job.getConfiguration().set("key.value.separator.in.input.line", "-");
In the reduce() method i had context.write(key, value); key is Longwritable and value is Text. But if i see the output of this job, i had seen the format like,
46 0002mCjpo9 -HM
253 000AxT9LSA -HM
460 000FYtnxiB -HM
667 000WNVBo9N -HM
874 000dQiseKz -HM
But i don't want first column to be added to each row. Please how to do that,