Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Emitting Java Collection as mapper output


Copy link to this message
-
Re: Emitting Java Collection as mapper output
Hello Harsh,

        Thank you so much for the valuable response.I'll proceed as
suggested by you.

Regards,
    Mohammad Tariq
On Tue, Jul 10, 2012 at 5:05 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Short answer: Yes.
>
> With Writable serialization, there's *some* support for collection
> structures in the form of MapWritable and ArrayWritable. You can make
> use of these classes.
>
> However, I suggest using Apache Avro for these things, its much better
> to use its schema/reflect oriented serialization than using Writables.
> See http://avro.apache.org
>
> On Tue, Jul 10, 2012 at 4:45 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> Hello list,
>>
>>       Is it possible to emit Java collections from a mapper??
>>
>> My code looks like this -
>> public class UKOOAMapper extends Mapper<LongWritable, Text,
>> LongWritable, List<Text>> {
>>
>>         public static Text CDPX = new Text();
>>         public static Text CDPY = new Text();
>>         public static List<Text> vals = new ArrayList<Text>();
>>         public static LongWritable count = new LongWritable(1);
>>
>>         public void map(LongWritable key, Text value, Context context)
>>                         throws IOException, InterruptedException {
>>                 String line = value.toString();
>>                 if (line.startsWith("Q")) {
>>                         CDPX.set(line.substring(2, 13).trim());
>>                         CDPY.set(line.substring(20, 25).trim());
>>                         vals.add(CDPX);
>>                         vals.add(CDPY);
>>                         context.write(count, vals);
>>                 }
>>         }
>> }
>>
>> And the driver class is -
>> public static void main(String[] args) throws IOException,
>> InterruptedException, ClassNotFoundException {
>>
>>                 Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
>>                 Configuration conf = new Configuration();
>>                 Job job = new Job(conf, "SupportFileValidation");
>>                 conf.set("mapreduce.output.key.field.separator", "              ");
>>                 job.setMapOutputValueClass(List.class);
>>                 job.setOutputKeyClass(LongWritable.class);
>>                 job.setOutputValueClass(Text.class);
>>                 job.setMapperClass(UKOOAMapper.class);
>>                 job.setReducerClass(ValidationReducer.class);
>>                 job.setInputFormatClass(TextInputFormat.class);
>>                 job.setOutputFormatClass(TextOutputFormat.class);
>>                 FileInputFormat.addInputPath(job, filePath);
>>                 FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
>>                 job.waitForCompletion(true);
>>         }
>>
>> When I am trying to execute the program, I am getting the following error -
>> 12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes
>> where applicable
>> 12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the
>> same.
>> 12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
>> 12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
>> 12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
>> 12/07/10 16:41:46 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
>> 12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
>> 12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
>> 12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
>> java.lang.NullPointerException
>>         at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)