Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Emitting Java Collection as mapper output


Copy link to this message
-
Emitting Java Collection as mapper output
Hello list,

      Is it possible to emit Java collections from a mapper??

My code looks like this -
public class UKOOAMapper extends Mapper<LongWritable, Text,
LongWritable, List<Text>> {

public static Text CDPX = new Text();
public static Text CDPY = new Text();
public static List<Text> vals = new ArrayList<Text>();
public static LongWritable count = new LongWritable(1);

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
if (line.startsWith("Q")) {
CDPX.set(line.substring(2, 13).trim());
CDPY.set(line.substring(20, 25).trim());
vals.add(CDPX);
vals.add(CDPY);
context.write(count, vals);
}
}
}

And the driver class is -
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {

Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
Configuration conf = new Configuration();
Job job = new Job(conf, "SupportFileValidation");
conf.set("mapreduce.output.key.field.separator", " ");
job.setMapOutputValueClass(List.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(UKOOAMapper.class);
job.setReducerClass(ValidationReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, filePath);
FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
job.waitForCompletion(true);
}

When I am trying to execute the program, I am getting the following error -
12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
12/07/10 16:41:46 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/07/10 16:41:47 INFO mapred.JobClient:  map 0% reduce 0%
12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001
12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0

Need some guidance from the experts. Please let me know where I am
going wrong. Many thanks.

Regards,
    Mohammad Tariq
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB