Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> how to specify key and value for an input to mapreduce job

Copy link to this message
RE: how to specify key and value for an input to mapreduce job
Hi Vamshi,

1. To read the input which have both key and value in text format you can use
KeyValueTextInputFormat inside org.apache.hadoop.mapreduce.lib.input package as InputFormat class to your Job. This Input format will have KeyValueLineRecordReader which will read the line and separate the key and value present in the same line.
Here you need to set the keyValue separator using following configuration in the job configuration.
Be default this will be '\t'.

2. Reduce output will be default TextOutputFormat with LongWritable key and Text value.
In Your case u need to have Text as both Key and Value.
Since you were using default TextInputFormat, u were getting complete line as the Value and the position as the key. Now if you use KeyValueTextInputFormat you will get the desired result.

Thanks and Regards,
Vinayakumar B
From: Vamshi Krishna [[EMAIL PROTECTED]]
Sent: Tuesday, February 14, 2012 8:28 PM
Subject: how to specify key and value for an input to mapreduce job

Hi all,
i have a job which read all the rows from a hbase table and had written them to a location in dfs i.e  /user/HSOP. HSOP is a folder which has 9 files each having their content as
00015DEGgJ    -HM
00016Pc4Tl    -HM
0001H0iImI    -HM
0001Oyb0Ju    -HM
0001hwBEOr    -HM
0002Qx2Uj9    -HM
0002jCs6gr    -HM
0003PMcWRa    -HM
000488xKIE    -HM

Both 1st and second columns are of Text type as specified in the first job's outputformat class.

Now i want onemore job to read all these files as input and and treat first column  element as "key" and second column  element as "value". For that i tried starting one job by specifying  line job.getConfiguration().set("key.value.separator.in.input.line", "-");

In the reduce() method i had context.write(key, value);  key is Longwritable and value is Text. But if i see the output of this job, i had seen the format like,

46    0002mCjpo9    -HM
253    000AxT9LSA    -HM
460    000FYtnxiB    -HM
667    000WNVBo9N    -HM
874    000dQiseKz    -HM

But i don't want first column to be added to each row. Please how to do that,
somebody help.