Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Mapreduce using JSONObjects


+
Max Lebedev 2013-06-04, 22:49
Copy link to this message
-
Re: Mapreduce using JSONObjects
Shahab Yunus 2013-06-04, 23:07
I don't think JSONObject implements the necessary interface that is
required for a class/type needs to be used as Key in Map/Reduce library.
WritableComparable is the one, I think.

Regards,
Shahab
On Tue, Jun 4, 2013 at 6:49 PM, Max Lebedev <[EMAIL PROTECTED]> wrote:

> Hi. I've been trying to use JSONObjects to identify duplicates in
> JSONStrings.
> The duplicate strings contain the same data, but not necessarily in the
> same order. For example the following two lines should be identified as
> duplicates (and filtered).
>
>
> {"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false
>
> {"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":false}
>
> This is the code:
>
> class DupFilter{
>
>         public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, JSONObject, Text> {
>
>                 public void map(LongWritable key, Text value,
> OutputCollector<JSONObject, Text> output, Reporter reporter) throws
> IOException{
>
>                 JSONObject jo = null;
>
>                 try {
>
>                         jo = new JSONObject(value.toString());
>
>                         } catch (JSONException e) {
>
>                                 e.printStackTrace();
>
>                         }
>
>                 output.collect(jo, value);
>
>                 }
>
>         }
>
>         public static class Reduce extends MapReduceBase implements
> Reducer<JSONObject, Text, NullWritable, Text> {
>
>                 public void reduce(JSONObject jo, Iterator<Text> lines,
> OutputCollector<NullWritable, Text> output, Reporter reporter) throws
> IOException {
>
>                         output.collect(null, lines.next());
>
>                 }
>
>         }
>
>         public static void main(String[] args) throws Exception {
>
>                 JobConf conf = new JobConf(DupFilter.class);
>
>                 conf.setOutputKeyClass(JSONObject.class);
>
>                 conf.setOutputValueClass(Text.class);
>
>                 conf.setMapperClass(Map.class);
>
>                 conf.setReducerClass(Reduce.class);
>
>                 conf.setInputFormat(TextInputFormat.class);
>
>                 conf.setOutputFormat(TextOutputFormat.class);
>
>                 FileInputFormat.setInputPaths(conf, new Path(args[0]));
>
>                 FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>                 JobClient.runJob(conf);
>
>         }
>
> }
>
> I get the following error:
>
>
> java.lang.ClassCastException: class org.json.JSONObject
>
>         at java.lang.Class.asSubclass(Class.java:3027)
>
>         at
> org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
>
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:817)
>
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:383)
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
>
>
>
> It looks like it has something to do with conf.setOutputKeyClass(). Am I
> doing something wrong here?
>
>
> Thanks,
>
> Max Lebedev
>
+
Mischa Tuffield 2013-06-04, 23:39
+
Rahul Bhattacharjee 2013-06-05, 02:58