Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Emitting Java Collection as mapper output


Copy link to this message
-
Emitting Java Collection as mapper output
Hello list,

      Is it possible to emit Java collections from a mapper??

My code looks like this -
public class UKOOAMapper extends Mapper<LongWritable, Text,
LongWritable, List<Text>> {

public static Text CDPX = new Text();
public static Text CDPY = new Text();
public static List<Text> vals = new ArrayList<Text>();
public static LongWritable count = new LongWritable(1);

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
if (line.startsWith("Q")) {
CDPX.set(line.substring(2, 13).trim());
CDPY.set(line.substring(20, 25).trim());
vals.add(CDPX);
vals.add(CDPY);
context.write(count, vals);
}
}
}

And the driver class is -
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {

Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
Configuration conf = new Configuration();
Job job = new Job(conf, "SupportFileValidation");
conf.set("mapreduce.output.key.field.separator", " ");
job.setMapOutputValueClass(List.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(UKOOAMapper.class);
job.setReducerClass(ValidationReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, filePath);
FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
job.waitForCompletion(true);
}

When I am trying to execute the program, I am getting the following error -
12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
12/07/10 16:41:46 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/07/10 16:41:47 INFO mapred.JobClient:  map 0% reduce 0%
12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001
12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0

Need some guidance from the experts. Please let me know where I am
going wrong. Many thanks.

Regards,
    Mohammad Tariq