Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> mapreduce linear chaining: ClassCastException

Copy link to this message
Re: mapreduce linear chaining: ClassCastException

Sorry for the KeyValueInputFormat It is KeyValueInputTextFormat itself. I was replying from my handheld and was getting the class name from memory, so excuse me for that. :)

For your further requirements like descending order, playing around with Comparator is required I believe.

Thank you

Bejoy K S

-----Original Message-----
From: "Periya.Data" <[EMAIL PROTECTED]>
Date: Sat, 15 Oct 2011 10:59:00
Subject: Re: mapreduce linear chaining: ClassCastException

Fantastic ! Thanks much Bejoy. Now, I am able to get the output of my MR-2
nicely. I had to convert the sum (in text) format to IntWritable and I am
able to get all the word frequency <Freq, Word> in ascending order. I used
"KeyValueTextInputFormat.class". My program was complaining when I used

Now, let me investigate how to do that in descending order...and then
top-20...etc. I know I must look into RawComparator and more...


On Sat, Oct 15, 2011 at 1:08 AM, <[EMAIL PROTECTED]> wrote:

> Hi
>    I believe what is happening in your case is that.
> The first map reduce jobs runs to completion
> When you trigger the second map reduce job, it is triggered with the
> default input format, TextInputFormat and definitely expects the key value
> as LongWritable and Text type. In default the MapReduce jobs output format
> is TextOutputFormat, key value as tab seperated. When you need to consume
> this output of an MR job  as key value pairs by another MR job, use
> KeyValueInputFormat, ie while setting config parameters for second job set
> jobConf.setInputFormat(KeyValueInput Format.class).
> Now if your output key value pairs use a different separator other than
> default tab then for second job you need to specify that as well using
> key.value.separator.in.input.line
> In short for your case in second map reduce job doing the following would
> get things in place
> -use jobConf.setInputFormat(KeyValueInputFormat.class)
> -alter your mapper to accept key values of type Text,Text
> -swap the key and values within mapper for output to reducer with
> conversions.
> To be noted here,AFAIK KeyValueInputFormat is not a part of new mapreduce
> API.
> Hope it helps.
> Regards
> Bejoy K S
> -----Original Message-----
> From: "Periya.Data" <[EMAIL PROTECTED]>
> Date: Fri, 14 Oct 2011 17:31:27
> Subject: mapreduce linear chaining: ClassCastException
> Hi all,
>   I am trying a simple extension of WordCount example in Hadoop. I want to
> get a frequency of wordcounts in descending order. To that I employ a
> linear
> chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the
> usual example). For the next MR job => I set the mapper to swap the <word,
> count> to <count, word>. Then,  have the Identity reducer to simply store
> the results.
> My MR-1 does its job correctly and store the result in a temp path.
> Question 1: The mapper of the second MR job (MR-2) doesn't like the input
> format. I have properly set the input format for MapClass2 of what it
> expects and what its output must be. It seems to expecting a LongWritable.
> I
> suspect that it is trying to look at some index file. I am not sure.
> It throws an error like this:
> <code>
>    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
> be cast to org.apache.hadoop.io.Text
> </code>
> Some Info:
> - I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it
> for now.
> - I use hadoop-0.20.2
> For MR-1:
> - conf1.setOutputKeyClass(Text.class);
> - conf1.setOutputValueClass(IntWritable.class);
> For MR-2
> - takes in a Text (word) and IntWritable (sum)
> - conf2.setOutputKeyClass(IntWritable.class);
> - conf2.setOutputValueClass(Text.class);
> <code>
> public class MapClass2 extends MapReduceBase
>      implements Mapper<Text, IntWritable, IntWritable, Text> {