Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> mapreduce linear chaining: ClassCastException


Copy link to this message
-
Re: mapreduce linear chaining: ClassCastException
Great!..

Sorry for the KeyValueInputFormat It is KeyValueInputTextFormat itself. I was replying from my handheld and was getting the class name from memory, so excuse me for that. :)

For your further requirements like descending order, playing around with Comparator is required I believe.

Thank you

Regards
Bejoy K S

-----Original Message-----
From: "Periya.Data" <[EMAIL PROTECTED]>
Date: Sat, 15 Oct 2011 10:59:00
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Subject: Re: mapreduce linear chaining: ClassCastException

Fantastic ! Thanks much Bejoy. Now, I am able to get the output of my MR-2
nicely. I had to convert the sum (in text) format to IntWritable and I am
able to get all the word frequency <Freq, Word> in ascending order. I used
"KeyValueTextInputFormat.class". My program was complaining when I used
"KeyValueInputFormat".

Now, let me investigate how to do that in descending order...and then
top-20...etc. I know I must look into RawComparator and more...

Thanks,
PD.

On Sat, Oct 15, 2011 at 1:08 AM, <[EMAIL PROTECTED]> wrote:

> Hi
>    I believe what is happening in your case is that.
> The first map reduce jobs runs to completion
> When you trigger the second map reduce job, it is triggered with the
> default input format, TextInputFormat and definitely expects the key value
> as LongWritable and Text type. In default the MapReduce jobs output format
> is TextOutputFormat, key value as tab seperated. When you need to consume
> this output of an MR job  as key value pairs by another MR job, use
> KeyValueInputFormat, ie while setting config parameters for second job set
> jobConf.setInputFormat(KeyValueInput Format.class).
> Now if your output key value pairs use a different separator other than
> default tab then for second job you need to specify that as well using
> key.value.separator.in.input.line
>
> In short for your case in second map reduce job doing the following would
> get things in place
> -use jobConf.setInputFormat(KeyValueInputFormat.class)
> -alter your mapper to accept key values of type Text,Text
> -swap the key and values within mapper for output to reducer with
> conversions.
>
> To be noted here,AFAIK KeyValueInputFormat is not a part of new mapreduce
> API.
>
> Hope it helps.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: "Periya.Data" <[EMAIL PROTECTED]>
> Date: Fri, 14 Oct 2011 17:31:27
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> Subject: mapreduce linear chaining: ClassCastException
>
> Hi all,
>   I am trying a simple extension of WordCount example in Hadoop. I want to
> get a frequency of wordcounts in descending order. To that I employ a
> linear
> chain of MR jobs. The first MR job (MR-1) does the regular wordcount (the
> usual example). For the next MR job => I set the mapper to swap the <word,
> count> to <count, word>. Then,  have the Identity reducer to simply store
> the results.
>
> My MR-1 does its job correctly and store the result in a temp path.
>
> Question 1: The mapper of the second MR job (MR-2) doesn't like the input
> format. I have properly set the input format for MapClass2 of what it
> expects and what its output must be. It seems to expecting a LongWritable.
> I
> suspect that it is trying to look at some index file. I am not sure.
>
>
> It throws an error like this:
>
> <code>
>    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
> be cast to org.apache.hadoop.io.Text
> </code>
>
> Some Info:
> - I use old API (org.apache.hadoop.mapred.*). I am asked to stick with it
> for now.
> - I use hadoop-0.20.2
>
> For MR-1:
> - conf1.setOutputKeyClass(Text.class);
> - conf1.setOutputValueClass(IntWritable.class);
>
> For MR-2
> - takes in a Text (word) and IntWritable (sum)
> - conf2.setOutputKeyClass(IntWritable.class);
> - conf2.setOutputValueClass(Text.class);
>
> <code>
> public class MapClass2 extends MapReduceBase
>      implements Mapper<Text, IntWritable, IntWritable, Text> {
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB