Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> mapreduce linear chaining: ClassCastException

Copy link to this message
Re: mapreduce linear chaining: ClassCastException
    I believe what is happening in your case is that.
The first map reduce jobs runs to completion
When you trigger the second map reduce job, it is triggered with the default input format, TextInputFormat and definitely expects the key value as LongWritable and Text type. In default the MapReduce jobs output format is TextOutputFormat, key value as tab seperated. When you need to consume this output of an MR job  as key value pairs by another MR job, use KeyValueInputFormat, ie while setting config parameters for second job set
jobConf.setInputFormat(KeyValueInput Format.class).
Now if your output key value pairs use a different separator other than default tab then for second job you need to specify that as well using key.value.separator.in.input.line

In short for your case in second map reduce job doing the following would get things in place
-use jobConf.setInputFormat(KeyValueInputFormat.class)
-alter your mapper to accept key values of type Text,Text
-swap the key and values for output

To be noted here,AFAIK KeyValueInputFormat is not a part of new mapreduce API.
Hope it helps.

Bejoy K S

-----Original Message-----
From: "Periya.Data" <[EMAIL PROTECTED]>
Date: Fri, 14 Oct 2011 17:31:27
Subject: mapreduce linear chaining: ClassCastException

Hi all,
   I am trying a simple extension of WordCount example in Hadoop. I want to
get a frequency of wordcounts in descending order. To that I employ a linear
chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the
usual example). For the next MR job => I set the mapper to swap the <word,
count> to <count, word>. Then,  have the Identity reducer to simply store
the results.

My MR-1 does its job correctly and store the result in a temp path.

Question 1: The mapper of the second MR job (MR-2) doesn't like the input
format. I have properly set the input format for MapClass2 of what it
expects and what its output must be. It seems to expecting a LongWritable. I
suspect that it is trying to look at some index file. I am not sure.
It throws an error like this:

    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text

Some Info:
- I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it
for now.
- I use hadoop-0.20.2

For MR-1:
- conf1.setOutputKeyClass(Text.class);
- conf1.setOutputValueClass(IntWritable.class);

For MR-2
- takes in a Text (word) and IntWritable (sum)
- conf2.setOutputKeyClass(IntWritable.class);
- conf2.setOutputValueClass(Text.class);

public class MapClass2 extends MapReduceBase
      implements Mapper<Text, IntWritable, IntWritable, Text> {

      public void map(Text word, IntWritable sum,
              OutputCollector<IntWritable, Text> output,
              Reporter reporter) throws IOException {

      output.collect(sum, word);   // <sum, word>

Any suggestions would be helpful. Is my MapClass2 code right in the first
place...for swapping? Or should I assume that mapper reads line by line,
so,  must read in one line, then, use StrTokenizer to split them up and
convert the second token (sum) from str to Int....?? Or should I mess around
with OutputKeyComparator class?