Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Avro data file support in Hadoop


Copy link to this message
-
Avro data file support in Hadoop
Hello,

I was trying to write a MR job for processing avro data file , which
contains serialized object conforming to a schema.

The schema is something this,

name:string
surname:string
age:long
address:string.
country:string

1) My plan was to read the avro data file using AvroMapper.
2)Create custom key (CustomKey) containing [name,surname,age] and custom
value (CustomValue) containing [address,country]
3)Collect Pair of CustomKey an CustomValue.
4)Create AvroReducer , expected key is CustomKey and a iterator of
CustomValue.

I am not able to get the reducer to work.Looks like the AvroJob expects the
mapper output key and output value to be of type AvroKey and AvroValue
type.I even tried to wrap my custom key and value using AvroKey and
AvroValue. It didn't help.

>From the exception it looks like, AvroReducer get is GenericData.Record
type of object.

Going further , I wish to do secondary sort based on age and using
[name,surname] for grouping comparator and using a custom comparator.

Not sure the helpers for Avro processing in Hadoop supports secondary sort
, custom writable , custom writable comparable or not.

Any pointers regarding avro data processing using hadoop would greatly be
appreciated.

Please let me know if any more information is required from me.

Thanks,
Rahul
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB