Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Emitting Java Collection as mapper output


Copy link to this message
-
Re: Emitting Java Collection as mapper output
Hello Harsh,

        Thank you so much for the valuable response.I'll proceed as
suggested by you.

Regards,
    Mohammad Tariq
On Tue, Jul 10, 2012 at 5:05 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Short answer: Yes.
>
> With Writable serialization, there's *some* support for collection
> structures in the form of MapWritable and ArrayWritable. You can make
> use of these classes.
>
> However, I suggest using Apache Avro for these things, its much better
> to use its schema/reflect oriented serialization than using Writables.
> See http://avro.apache.org
>
> On Tue, Jul 10, 2012 at 4:45 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> Hello list,
>>
>>       Is it possible to emit Java collections from a mapper??
>>
>> My code looks like this -
>> public class UKOOAMapper extends Mapper<LongWritable, Text,
>> LongWritable, List<Text>> {
>>
>>         public static Text CDPX = new Text();
>>         public static Text CDPY = new Text();
>>         public static List<Text> vals = new ArrayList<Text>();
>>         public static LongWritable count = new LongWritable(1);
>>
>>         public void map(LongWritable key, Text value, Context context)
>>                         throws IOException, InterruptedException {
>>                 String line = value.toString();
>>                 if (line.startsWith("Q")) {
>>                         CDPX.set(line.substring(2, 13).trim());
>>                         CDPY.set(line.substring(20, 25).trim());
>>                         vals.add(CDPX);
>>                         vals.add(CDPY);
>>                         context.write(count, vals);
>>                 }
>>         }
>> }
>>
>> And the driver class is -
>> public static void main(String[] args) throws IOException,
>> InterruptedException, ClassNotFoundException {
>>
>>                 Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
>>                 Configuration conf = new Configuration();
>>                 Job job = new Job(conf, "SupportFileValidation");
>>                 conf.set("mapreduce.output.key.field.separator", "              ");
>>                 job.setMapOutputValueClass(List.class);
>>                 job.setOutputKeyClass(LongWritable.class);
>>                 job.setOutputValueClass(Text.class);
>>                 job.setMapperClass(UKOOAMapper.class);
>>                 job.setReducerClass(ValidationReducer.class);
>>                 job.setInputFormatClass(TextInputFormat.class);
>>                 job.setOutputFormatClass(TextOutputFormat.class);
>>                 FileInputFormat.addInputPath(job, filePath);
>>                 FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
>>                 job.waitForCompletion(true);
>>         }
>>
>> When I am trying to execute the program, I am getting the following error -
>> 12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes
>> where applicable
>> 12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the
>> same.
>> 12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
>> 12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
>> 12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
>> 12/07/10 16:41:46 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
>> 12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
>> 12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
>> 12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
>> java.lang.NullPointerException
>>         at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB