Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Mapreduce using JSONObjects


+
Max Lebedev 2013-06-04, 22:49
Copy link to this message
-
Re: Mapreduce using JSONObjects
I don't think JSONObject implements the necessary interface that is
required for a class/type needs to be used as Key in Map/Reduce library.
WritableComparable is the one, I think.

Regards,
Shahab
On Tue, Jun 4, 2013 at 6:49 PM, Max Lebedev <[EMAIL PROTECTED]> wrote:

> Hi. I've been trying to use JSONObjects to identify duplicates in
> JSONStrings.
> The duplicate strings contain the same data, but not necessarily in the
> same order. For example the following two lines should be identified as
> duplicates (and filtered).
>
>
> {"ts":1368758947.291035,"isSecure":true,"version":2,"source":"sdk","debug":false
>
> {"ts":1368758947.291035,"version":2,"source":"sdk","isSecure":true,"debug":false}
>
> This is the code:
>
> class DupFilter{
>
>         public static class Map extends MapReduceBase implements
> Mapper<LongWritable, Text, JSONObject, Text> {
>
>                 public void map(LongWritable key, Text value,
> OutputCollector<JSONObject, Text> output, Reporter reporter) throws
> IOException{
>
>                 JSONObject jo = null;
>
>                 try {
>
>                         jo = new JSONObject(value.toString());
>
>                         } catch (JSONException e) {
>
>                                 e.printStackTrace();
>
>                         }
>
>                 output.collect(jo, value);
>
>                 }
>
>         }
>
>         public static class Reduce extends MapReduceBase implements
> Reducer<JSONObject, Text, NullWritable, Text> {
>
>                 public void reduce(JSONObject jo, Iterator<Text> lines,
> OutputCollector<NullWritable, Text> output, Reporter reporter) throws
> IOException {
>
>                         output.collect(null, lines.next());
>
>                 }
>
>         }
>
>         public static void main(String[] args) throws Exception {
>
>                 JobConf conf = new JobConf(DupFilter.class);
>
>                 conf.setOutputKeyClass(JSONObject.class);
>
>                 conf.setOutputValueClass(Text.class);
>
>                 conf.setMapperClass(Map.class);
>
>                 conf.setReducerClass(Reduce.class);
>
>                 conf.setInputFormat(TextInputFormat.class);
>
>                 conf.setOutputFormat(TextOutputFormat.class);
>
>                 FileInputFormat.setInputPaths(conf, new Path(args[0]));
>
>                 FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>                 JobClient.runJob(conf);
>
>         }
>
> }
>
> I get the following error:
>
>
> java.lang.ClassCastException: class org.json.JSONObject
>
>         at java.lang.Class.asSubclass(Class.java:3027)
>
>         at
> org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
>
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:817)
>
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:383)
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
>
>
>
> It looks like it has something to do with conf.setOutputKeyClass(). Am I
> doing something wrong here?
>
>
> Thanks,
>
> Max Lebedev
>
+
Mischa Tuffield 2013-06-04, 23:39
+
Rahul Bhattacharjee 2013-06-05, 02:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB